Models, chips, and business models: Google chooses where to make margin
Eight articles tell the industrial spine of the event. The Gemini 3.5 family ships with three converging announcements: Gemini 3.5 Flash as the new default that beats Gemini 3.1 Pro on coding and agentic benchmarks at a fraction of the cost; Gemini Omni as the multimodal model that produces video from text, images, and audio; Gemma 4 as the open-weight front with four sizes. The Neural Expressive redesign of the Gemini app closes the consumer thread. On the chip front, Google explicitly splits training and inference for the first time: TPU 8t for frontier models, TPU 8i for agents. On the business-model front, three announcements say the same thing: value isn't in the models, it's in the infrastructure that runs them. AI Ultra drops to $200, the Gemini API moves from prompts per day to compute consumed, AI Pro includes YouTube Premium Lite. Google is building its margin on inference.
Buried in the Gemini plan reshuffle is a detail worth nine dollars a month: anyone paying for AI Pro at $19.99 now gets YouTube Premium Lite included. A retention move dressed up as a gift.
Google rewrites Gemini's visual language with fluid animation, fresh typography, and responses that structure information instead of dumping it. Rolling out now on Android, iOS, and the web.
Google drops daily prompt caps and introduces a metric based on compute actually consumed. Quota refreshes every five hours, weekly ceiling, and if you hit it you get rerouted to smaller models instead of being blocked.
At I/O 2026 Google announced Gemma 4, the new generation of its open-weight models built for on-device or self-hosted deployment. Four sizes from E2B to 31B, gains in code generation and instruction following, and a 27B variant optimized for 4-bit inference on consumer-grade hardware.
At the I/O 2026 developer keynote Google reiterated its new eighth-generation TPUs: two distinct chips, one for training frontier models (8t), one for serving agents (8i). It's the first time Google splits the two workloads.
On the Mountain View stage, Sundar Pichai introduced the first model in the Gemini 3.5 family. The detail that matters: Flash beats Gemini 3.1 Pro on agentic and coding benchmarks at a fraction of the cost.
Positioning move: Google cuts $50 from the top plan and introduces a $100/month mid-tier plan aimed at developers and knowledge workers with five times the Pro plan's limits.
Announced by Sundar Pichai on the Mountain View stage, Gemini Omni is the model that reasons across all media. The first member of the family, Gemini Omni Flash, will arrive this summer on the Gemini app, YouTube Shorts, and Flow.