Ditch Claude Code Fees: Run Local Models with Ollama

In the grand tradition of tech innovations promising to liberate us from the shackles of subscription fees, Leon van Zyl's recent YouTube video offers a familiar siren call: run AI coding agents offline, for free, using local models with Ollama. If you've been around the tech block a few times—say, since the days when "floppy disk" was a literal description—you know this tune. But let's see if this time, the melody hits a new note or two.

The Setup: Local Models with Ollama

Leon van Zyl's pitch is straightforward: use local models in Claude Code without incurring subscription fees. The magic ingredient here is Ollama, a tool that allows you to run large language models on your machine. This means you can tinker with AI coding agents without needing an API key or shelling out for a subscription, which, let's face it, feels as refreshing as finding a working ATM in the 90's—without a surcharge.

The Process: A Walkthrough

The video guides you through the installation of Ollama, downloading recommended models like Qwen3-Coder, GPT-OSS, and GLM 4.7 Flash, and configuring your setup to work offline. As Leon puts it, "This is completely free. No need for any subscriptions or API costs." It's a statement that echoes the optimism of early tech adopters who believed their dial-up connections would bring about world peace.

Once you've got Ollama installed, you can verify its functionality by running a few commands in your terminal. If you've ever typed 'dir' in MS-DOS, you'll feel right at home here. The process is reminiscent of those simpler times when computing power was measured in megabytes and the closest we got to AI was Clippy suggesting we write a letter.

The Models: Choose Your Fighter

The real meat of van Zyl's video lies in selecting the right model for your hardware. The smallest of the bunch, the GPTO OSS 20 billion parameter model, requires about 14 gigs of VRAM—although Leon assures us "you can get away with running this on way less VRAM." For those with more robust graphics cards, the GLM 4.7 Flash model offers a heftier experience, clocking in at 30 billion parameters. It's like choosing between a compact sedan and a muscle car—both get you there, but one does it with a bit more flair.

For the more adventurous, there's always the option to go big with the GPTOSS 120 billion parameter model. But as van Zyl notes, if you're planning on running this on anything less than a supercomputer, expect some offloading and slower responses. The trade-off feels akin to waiting for your dial-up connection to download a single image—eventually, you'll get there, but only after you've made a sandwich and caught up on your favorite 90's sitcom.

The Skeptic's Corner: Local vs. Cloud

Now, before you toss your API keys out the window, let's consider the caveats. Leon admits that these local models "will never compete with the likes of Opus and Sonnet 4.5." It's a reality check that harks back to the days when we realized our 56k modems wouldn't quite cut it for streaming anything beyond a 5-second clip.

Running models locally offers freedom—from fees, from subscriptions, from the prying eyes of the cloud. But it also means accepting limitations in performance and scalability. It's a bit like choosing to drive a vintage car; sure, it looks cool and runs without the latest tech bells and whistles, but don't expect it to handle like a Tesla.

The Future: A Cloudy Forecast?

As we consider the promise of running AI models offline, one can't help but wonder about the future of cloud vs. local computing. Will we see a resurgence of local-first philosophies, or is this just another cycle in the endless ebb and flow of tech trends? As Leon suggests, "these models are perfect if you’re starting out with agentic coding and vibe coding," but for those seeking enterprise-level performance, the cloud's siren song may still prove irresistible.

In the end, whether you choose to run local models or stick with the cloud, remember this: technology evolves, but the promises often remain the same. It's a bit like watching "The Matrix" for the first time—you're awed by the possibilities, but also aware that reality can sometimes disappoint. And so, the question remains: in a world of subscription fatigue, is the local model renaissance the red pill we've been waiting for?

By Mike Sullivan