Google Dumps Custom License, Gemma 4 Goes Apache 2.0

Google’s been on a roll with Gemini, but if you want to actually run those models yourself, you’re stuck with whatever Google decides to serve. The Gemma line was supposed to fix that, but Gemma 3 launched over a year ago and it’s starting to feel dated.

Today, Gemma 4 drops. Four sizes, all optimized for local usage. And here’s the big one: Google finally listened to developer complaints and dumped the custom Gemma license in favor of Apache 2.0. No more weird restrictions, no more wondering if you’re allowed to do something. Just standard open source licensing.

The hardware story is interesting.

The two large variants are a 26B Mixture of Experts and a 31B Dense model. Google says both can run unquantized in bfloat16 on a single 80GB Nvidia H100 GPU. That’s a $20,000 card, sure, but it’s still local hardware. If you quantize them down to lower precision, they’ll fit on consumer GPUs. I’ve seen this claim before, but Google’s latency optimizations might actually make it usable.

The 26B MoE model only activates 3.8 billion of its 26 billion parameters during inference. That’s a massive efficiency gain. Tokens per second should be significantly higher than similarly sized dense models. The 31B Dense variant is more about quality, and Google expects developers to fine-tune it for specific use cases. Makes sense — dense models generally perform better per parameter at smaller scales, but MoE wins on throughput.

I’m curious to see how these stack up against Llama 3 and Mistral’s offerings. Google’s local model game has been inconsistent, but the Apache 2.0 switch removes a major barrier. Now it’s about actual performance, not licensing headaches.

Google Dumps Custom License, Gemma 4 Goes Apache 2.0

Comments (0)