Google's Gemma 4 Runs Free on Your Machine

I've been watching AI companies promise "free" and "open" for long enough to know that both words usually come with asterisks the size of Terms of Service agreements. So when Google dropped Gemma 4 on April 3rd with claims that you can run serious AI models on your own hardware with zero cloud costs, my first instinct was to find the catch.

Here's what Julian Goldie's video demonstrates: Google released four versions of Gemma 4 (E2B, E4B, 26B, and 31B), and you can actually download them, install them via Ollama, connect them to OpenClaw, and run AI tasks locally. No API keys. No usage meters ticking up. The video shows him building a working SEO calculator in HTML using nothing but a local model and a single prompt.

The benchmarks are legitimately impressive if they hold up in practice. The 31B model hit #3 on Arena AI's open model leaderboard at launch, which means it's genuinely competitive. More telling: on the AIM 2026 math benchmark, Gemma jumped from 20.8% to 89.2%. On Live Code Bench V6, it went from 29.1% to 80.0%. Those aren't incremental improvements—they're the kind of leaps that suggest either genuine architectural innovation or benchmark optimization. Probably both.

What Actually Works Here

The mixture of experts architecture in the 26B model is clever—26 billion total parameters but only 3.8 billion active per token. That's how you get "near 26B quality at close to 4B speed," as Goldie puts it. If that ratio holds in real-world use, it's meaningful. Most of the time, parameter count is marketing. This time it might actually matter for people who want large model capabilities without large model hardware requirements.

The Apache 2.0 license is genuinely open. No usage restrictions, no royalties, commercial use allowed. That's not always the case with "open" models—remember when Meta's LLaMA had that creative definition of open that excluded anyone with more than 700 million monthly active users? This is actually open.

OpenClaw, created by developer Peter Steinberger, solves a real interface problem. Instead of another chat window or browser tab, it runs as a background process and lets you interact through messaging apps you already use—Telegram, Slack, Discord, WhatsApp, iMessage. The 247,000 GitHub stars suggest people find that useful. The integration with local models through Ollama means everything can stay on your machine.

Goldie's demo is straightforward: "Build a working SEO calculator in HTML that calculates keyword density, word count, and a basic readability estimate from text input." One prompt. Gemma 4 generates the file. He saves it, opens it in a browser, and it works. No tokens counted, no API call made.

The Practical Limitations

There's an important caveat buried in the video that matters: "When using local models through Olma, basic chat and text generation work great. Full agentic tool use like file execution and browser control works best with cloud models due to how the streaming protocol handles tool calls locally."

Translation: You can run it locally, but you can't do everything locally. Code generation works. Basic chat works. But if you want the full agent behavior—file manipulation, browser control, complex tool chains—you're back to cloud models. That's not a dealbreaker, but it's the gap between the promise and the reality.

The hardware requirements are real. The smallest model (E4B) needs at least 5GB of RAM. The 26B mixture of experts model wants around 18GB for the quantized version. The 31B model needs more. If you're running a three-year-old laptop with 8GB of RAM, you're not running the impressive models. You're running the edge versions that won't match the benchmarks in the video.

Context windows range from 128k tokens on smaller models to 256k on larger ones. That's competitive with cloud offerings, assuming you have the memory to actually use it. A 256k context window on a machine with 16GB of RAM is technically possible but practically constrained.

The Pattern I Keep Seeing

Every few months, someone announces that local AI is finally viable. Sometimes it's a new model, sometimes it's a new interface, sometimes it's a new trick for squeezing performance out of consumer hardware. And every time, it's sort of true.

Yes, you can run powerful models locally now. Yes, the quality has improved dramatically. Yes, the setup is easier than it used to be. But "free" means free if you already have adequate hardware and technical comfort. The video makes it look simple—three steps, one demo, done. But the number of people who will actually set this up, use it regularly, and find it preferable to just paying for ChatGPT or Claude is probably smaller than the view count suggests.

The community aspect is interesting. Goldie mentions his AI Profit Boardroom—2,000 members sharing what actually works. That's where the real value might be, because local AI workflows require troubleshooting that cloud services abstract away. When the model doesn't load, when the integration breaks, when the performance doesn't match the benchmarks, you need people who've solved those problems.

Google has released genuinely capable open models before. The original Gemma saw 400 million downloads across 100,000 community-built variants, which suggests real adoption. Gemma 4 seems like a legitimate step forward. The benchmarks are strong. The architecture is clever. The license is actually open.

But I've also seen "this changes everything" enough times to know that most things don't change everything. They change some things, for some people, in some contexts. If you're already comfortable with command-line tools, have decent hardware, and want to avoid API costs, this is legitimately useful. If you're hoping this makes AI accessible to non-technical users who don't want to think about RAM requirements and model sizes, we're not there yet.

The question isn't whether Gemma 4 works—the benchmarks and demos suggest it does. The question is whether "works" and "worth the friction" are the same thing for enough people to matter. That answer depends less on Google's engineering than on what you're trying to build and how much you value keeping everything local.

Mike Sullivan is Buzzrag's technology correspondent and former Microsoft and Amazon engineer who has seen every "free and open" promise since SourceForge launched.