Ditch Claude Code Fees: Run Local Models with Ollama
Learn how to use local models with Claude Code using Ollama. Avoid subscription fees and run AI coding agents offline.
Written by AI. Mike Sullivan
January 25, 2026

Photo: Leon van Zyl / YouTube
Ditch Claude Code Fees: Run Local Models with Ollama
In the grand tradition of tech innovations promising to liberate us from the shackles of subscription fees, Leon van Zyl's recent YouTube video offers a familiar siren call: run AI coding agents offline, for free, using local models with Ollama. If you've been around the tech block a few times—say, since the days when "floppy disk" was a literal description—you know this tune. But let's see if this time, the melody hits a new note or two.
The Setup: Local Models with Ollama
Leon van Zyl's pitch is straightforward: use local models in Claude Code without incurring subscription fees. The magic ingredient here is Ollama, a tool that allows you to run large language models on your machine. This means you can tinker with AI coding agents without needing an API key or shelling out for a subscription, which, let's face it, feels as refreshing as finding a working ATM in the 90's—without a surcharge.
The Process: A Walkthrough
The video guides you through the installation of Ollama, downloading recommended models like Qwen3-Coder, GPT-OSS, and GLM 4.7 Flash, and configuring your setup to work offline. As Leon puts it, "This is completely free. No need for any subscriptions or API costs." It's a statement that echoes the optimism of early tech adopters who believed their dial-up connections would bring about world peace.
Once you've got Ollama installed, you can verify its functionality by running a few commands in your terminal. If you've ever typed 'dir' in MS-DOS, you'll feel right at home here. The process is reminiscent of those simpler times when computing power was measured in megabytes and the closest we got to AI was Clippy suggesting we write a letter.
The Models: Choose Your Fighter
The real meat of van Zyl's video lies in selecting the right model for your hardware. The smallest of the bunch, the GPTO OSS 20 billion parameter model, requires about 14 gigs of VRAM—although Leon assures us "you can get away with running this on way less VRAM." For those with more robust graphics cards, the GLM 4.7 Flash model offers a heftier experience, clocking in at 30 billion parameters. It's like choosing between a compact sedan and a muscle car—both get you there, but one does it with a bit more flair.
For the more adventurous, there's always the option to go big with the GPTOSS 120 billion parameter model. But as van Zyl notes, if you're planning on running this on anything less than a supercomputer, expect some offloading and slower responses. The trade-off feels akin to waiting for your dial-up connection to download a single image—eventually, you'll get there, but only after you've made a sandwich and caught up on your favorite 90's sitcom.
The Skeptic's Corner: Local vs. Cloud
Now, before you toss your API keys out the window, let's consider the caveats. Leon admits that these local models "will never compete with the likes of Opus and Sonnet 4.5." It's a reality check that harks back to the days when we realized our 56k modems wouldn't quite cut it for streaming anything beyond a 5-second clip.
Running models locally offers freedom—from fees, from subscriptions, from the prying eyes of the cloud. But it also means accepting limitations in performance and scalability. It's a bit like choosing to drive a vintage car; sure, it looks cool and runs without the latest tech bells and whistles, but don't expect it to handle like a Tesla.
The Future: A Cloudy Forecast?
As we consider the promise of running AI models offline, one can't help but wonder about the future of cloud vs. local computing. Will we see a resurgence of local-first philosophies, or is this just another cycle in the endless ebb and flow of tech trends? As Leon suggests, "these models are perfect if you’re starting out with agentic coding and vibe coding," but for those seeking enterprise-level performance, the cloud's siren song may still prove irresistible.
In the end, whether you choose to run local models or stick with the cloud, remember this: technology evolves, but the promises often remain the same. It's a bit like watching "The Matrix" for the first time—you're awed by the possibilities, but also aware that reality can sometimes disappoint. And so, the question remains: in a world of subscription fatigue, is the local model renaissance the red pill we've been waiting for?
By Mike Sullivan
Watch the Original Video
Claude Code + Ollama = Free Forever
Leon van Zyl
9m 15sAbout This Source
Leon van Zyl
Leon van Zyl is an emerging YouTube creator focused on providing comprehensive tutorials for AI-driven software development. With an undisclosed number of subscribers, his channel has been active since November 2025 and is quickly gaining traction in the programming education community. Leon offers practical, step-by-step guides designed for both beginners and experienced developers, highlighting real-world applications of AI tools and technologies.
Read full source profileMore Like This
Navigating Git Workflows: Which One Fits Your Team?
Explore GitFlow, GitHub Flow, and Trunk-Based Development to find the best workflow for your team.
AI Coding Agents Have a Context Problem. Here's One Fix.
MCP2CLI tackles AI coding's context bloat by converting MCP servers to bash commands. Does runtime conversion beat previous attempts at solving this?
Becoming a Claude Code Power User
Master Claude Code updates with custom tools and stay ahead.
Integrating Claude Code with GitHub Actions: A Deep Dive
Explore the integration of Claude Code with GitHub Actions, covering setup, costs, and AI-driven automation.