While the keynote's headline goes to Gemini 3.5 Flash, Google quietly announces Gemma 4: the fourth generation of its open-weights model family built to run on-device or self-hosted, a direct alternative to the proprietary cloud.
The sizes
The new family ships in four sizes: E2B, E4B, 31B and 26B A4B. The naming covers different scenarios: from inference on laptops and phones (E2B, E4B) up to self-hosting in private cloud or GPU servers (31B). The 27B variant introduces a quantization scheme designed to run cleanly at 4-bit on consumer hardware — which, in practice, means more serious models running on a single desktop GPU.
What's new versus Gemma 3
Stated improvements are mostly around instruction and code: more accurate instruction following, stronger code generation, and better handling of long prompts. Google AI Studio, Hugging Face and Kaggle are the three official channels to download them; the Gemma license remains free for commercial use.
Why it matters
For anyone working under privacy, data sovereignty or latency constraints, the availability of competitive open-weight models isn't a detail: it's the difference between going through a commercial API and hosting everything in-house. As the Google Developers blog noted, Gemma 4 positions itself explicitly as the "open companion" to the Gemini family — a way to cover both markets without forcing everyone onto Google's cloud. The real question, for whoever picks which model to adopt, is whether the quality gap to closed models stays acceptable for the specific use case.