Google's Gemma 4 Ships With Apache 2 License—No

Google just released Gemma 4, and the headline feature isn't the multimodality or the reasoning capabilities or even the edge optimization. It's the license.

Gemma 4 ships under an actual Apache 2 license—not a custom license with restrictive clauses, not an "open weights" framework with competitive moats built in. Just Apache 2. Which means you can take these models, modify them, fine-tune them, deploy them commercially, and Google legally cannot care what you do with them.

For context: previous Gemma releases came with licensing restrictions that frustrated developers enough that many chose Llama or Qwen instead, despite Google's technical capabilities. Sam Witteveen, who's been covering the Gemma line since its original release, frames the licensing shift as "Google basically saying, 'Okay, fine. We'll play the same terms as some of the other open model providers out there.'"

The timing matters. While Google is loosening restrictions, some Chinese model providers are pulling back on open releases. The competitive landscape for truly open models is shifting, and Google is making a bet that unrestricted access will drive adoption more effectively than protective licensing.

The Technical Package

Gemma 4 consists of four models split into two tiers: workstation models (a 31B parameter dense model and a 26B parameter mixture-of-experts model with 3.8B active parameters) and edge models (E2B and E4B, designed for phones, Raspberry Pis, and embedded devices).

What's architecturally interesting here is that Google isn't just scaling up previous approaches. The models ship with native audio processing, native vision, built-in reasoning, and function calling—all integrated at the architecture level rather than bolted on post-training. As Witteveen notes, "up until recently, most of these models were text only or at best text plus vision. If you want audio, you're kind of bolting on whisper. You're bolting on some external ASR pipeline."

The edge models particularly demonstrate where Google is making architectural trade-offs. The audio encoder has been compressed from 681 million parameters down to 305 million—a 50% reduction that drops storage requirements from 390MB to 87MB. Frame duration improved from 160 milliseconds to 40 milliseconds, which should translate to noticeably more responsive transcription.

The vision encoder shrunk from 300-350 million parameters down to 150 million, but gained native aspect ratio processing. Previous Gemma models struggled with OCR and document understanding partly because they couldn't handle aspect ratios well. The new architecture processes images as they are, which matters significantly for real-world document workflows.

What Native Multimodality Actually Means

The practical difference between native multimodal support and bolted-on capabilities shows up in how you use the models. Witteveen's demo reveals you can toggle reasoning on and off with a simple parameter (enable_thinking=true/false), process images by passing them directly to the processor, and handle audio with the same unified interface.

More interestingly, the edge models support speech-to-translated-text within a single model running on-device. You can speak in one language and receive text output in another without touching the cloud. Witteveen demonstrates this with English-to-Japanese translation running on the smallest E2B model—not something you'd necessarily choose over a dedicated ASR pipeline, but useful if you're already running the model for other tasks and want to avoid additional dependencies.

The function calling implementation also differs from typical approaches. Rather than training models to be better at instruction-following and hoping they cooperate with your prompt template, Gemma 4 bakes function calling into the architecture from the start. According to Witteveen, this is "optimized for multi-turn agentic flows" and shows up meaningfully in agentic benchmarks.

The Mixture of Experts Approach

The 26B mixture-of-experts model uses an unusual architecture: 128 tiny experts with eight activated per token, plus one shared always-on expert. This contrasts with recent trends toward larger numbers of experts in other MoE models.

The trade-off delivers roughly the intelligence of a 27B dense model with the compute cost of a 4B model. It's runnable on consumer GPUs, and Google is releasing quantized-aware training (QAT) checkpoints so quality remains high even at lower precision.

For developers who don't want the MoE complexity, the 31B dense workstation model takes a different path—fewer layers than Gemma 3 but with architectural upgrades including value normalization and an attention mechanism optimized for long context. Both workstation models support 256K context windows, which is substantial for locally-run models.

Edge Deployment Reality Check

Witteveen ran the smallest models on a T4 GPU without issues. The larger models require more serious hardware—H100s or RTX 6000 Pro GPUs if you're running them unquantized. But Google is also enabling serverless deployment through Cloud Run with G4 GPUs (Nvidia RTX Pro 6000 with 96GB VRAM), where the models spin down to zero when not in use.

The edge models are designed for scenarios where you genuinely don't want cloud dependencies: voice-first AI assistants, embedded vision systems, on-device translation. With 128K context windows, vision, audio, function calling, and reasoning capabilities in models small enough to run with low latency, the use cases start looking practical rather than aspirational.

What This Enables (And What It Doesn't)

The Apache 2 licensing combined with the base model releases creates space for fine-tuning workflows that previous Gemma versions didn't support cleanly. Witteveen notes that "when you've got actually a very strong base model like the Gemma models have always had, you can really benefit from doing your own fine-tuning for particular use cases."

But the release also surfaces questions about what's not here. Only the two smaller edge models support audio—the workstation models are vision and text only. The architecture choices suggest Google is still figuring out where different capabilities belong in different model tiers.

And Witteveen suspects this isn't the complete Gemma 4 family: "I don't think that this is the full family of Gemma 4 models. I think we'll probably see more over the next few months or so." Which raises the standard open source question: how do you build production systems on models that might be superseded before you finish integrating them?

The licensing change removes one major friction point for developers who wanted to use Gemma but couldn't commit to restrictive terms. Whether that's enough to shift adoption patterns in a space where Llama, Qwen, and Mistral have already established ecosystems remains an empirical question.

—Dev Kapoor