Cloud Run for agents: MCP servers GA, Blackwell GPUs and ephemeral sandboxes

Google repositions Cloud Run as a runtime platform for AI agents. Managed MCP servers GA, NVIDIA RTX PRO 6000 Blackwell support for 70B+ parameter models, integrated ephemeral sandboxes: the message is that production-grade infrastructure for agents has to scale to zero, not inflate the bill.

Published: 2026-05-21T01:05:00+02:00 Topic area: How the web is evolving

The move: Cloud Run as a runtime for agents

At I/O 2026 Google repositioned Cloud Run from a generic serverless container service to a first-class runtime for AI agents. The news isn't a single announcement but a coordinated sequence that redefines the service's value proposition.

Managed MCP servers in General Availability

The first piece: Cloud Run now hosts managed Model Context Protocol servers in GA. For those not following the protocol, MCP is the open standard — proposed initially by Anthropic, effectively adopted by OpenAI, Google and others — that lets an agent discover and call external tools in a uniform way.

GA means two practical things: clear pricing and a contractual SLA. Developers and agents can deploy an MCP server with one command, expose it as a scalable endpoint and use it as a tool source inside Antigravity, ADK, Agent Platform or any compliant client.

NVIDIA RTX PRO 6000 Blackwell, scale-to-zero included

The second piece is infrastructural: Cloud Run adds GA support for NVIDIA RTX PRO 6000 Blackwell GPUs. Translation: you can serve 70-billion-parameter models and beyond without managing VMs, without manual orchestration, and — this is the point — with scaling to zero when there's no traffic.

This is the piece that makes the difference for those self-hosting open weights — Gemma 4, Llama, specialized models — instead of paying for APIs. Until now, self-hosting large models meant a GPU bill running even when nobody was calling. Scale-to-zero on Blackwell changes the economics.

Integrated ephemeral sandboxes

The third piece is about security. Cloud Run integrates an ephemeral sandbox tool that lets an agent spawn an execution environment isolated from its own code. The agent gets a clean Linux box, runs the risky step — a shell command, an eval of untrusted code, a browse — and the sandbox is discarded at the end of the task.

The pattern is the same one Antigravity uses behind the scenes and that Google is now exposing as a reusable infrastructure primitive. It's the right answer to a real class of problems: agents executing arbitrary code have a massive attack surface, and putting each execution in an ephemeral container is correct practice.

What it means for builders

Put together — managed MCP, scale-to-zero Blackwell GPUs, integrated sandboxes — the three pieces sketch an agent-native platform. You no longer need to wire together GPU orchestration, MCP hosting and sandboxing on your own: it all sits on Cloud Run with a consistent cost model.

The trade-off is the usual one: less integration friction, more dependence on Google Cloud. For startups that need to ship to production fast, the upside beats the lock-in cost. For anyone building something that needs to stay cloud-neutral, the warning still holds.