Hybrid inference: Chrome brings Gemini Nano on-device to GA, Firebase AI Logic handles the cloud fallback

Local web inference in Chrome graduates to general availability, and Firebase AI Logic builds the bridge between on-device and cloud, now on iOS too. The browser stops being a passive AI client.

Published: 2026-05-20T04:08:00+02:00 Category: Web e AI Tags: Chrome, Firebase AI Logic, Gemini Nano, on-device, sviluppatori

One of the less flashy lines at Google I/O 2026 is also one of the most important for anyone building a front-end: hybrid inference is becoming standard, and the browser stops being a passive AI client.

Local web inference in Chrome hits GA

**Local web inference** in Chrome — the ability for web apps to run models directly on the user's device — graduates to general availability. Under the hood is **Gemini Nano**, Google's ~4 GB on-device LLM that already powers native Chrome features like Help me write, incoming-message scam detection, page summaries and tab group suggestions. That same infrastructure is now available to any web app through the Prompt API.

Firebase AI Logic and the cloud fallback

The bridge between on-device and cloud is **Firebase AI Logic**. The library exposes a single API: the app first tries local inference via Chrome's Prompt API, and if the device can't handle it — or the requested model isn't available — it falls back to a server-side Gemini provider. The fallback logic sits in the library, not in the developer's code. **Hybrid inference** is now available on iOS too, and the Android option has expanded to support Gemma 4.

What it means for developers

The pitch is the classic edge story: low latency, zero cost for simple queries, and data that never leaves the device for sensitive cases. Trip.com is cited as an early adopter — it uses built-in AI to generate personalized travel summaries without server calls. There's a hidden cost worth flagging: Gemini Nano weighs nearly four gigabytes and Chrome downloads it silently, a choice that as of May 2026 has already drawn privacy criticism. The autonomy gain has a price in disk space and in compute pushed onto the client.

The net effect is clear: everything-through-a-server-API is no longer the only reasonable default.

Sources

← Back to all announcements