Nvidia's New AI Model Runs Locally

Nvidia just dropped Nemotron 3 Super, a 120-billion-parameter AI model you can actually download and run on your own hardware. Not in the cloud. Not through an API. Locally. Which sounds amazing until you realize what that actually means in practice—and what the Level1Techs team discovered when they started stress-testing it.

The specs are legitimately impressive: it's a mixture-of-experts architecture with 12 billion parameters active at any given time, which means you can run it on something like a Dell with 128GB of VRAM. Nvidia even released an FP4 version optimized for local deployment. The team at Level1Techs has been testing it alongside their own model, Kappa, and integrating both into Turnstone—an open-source AI orchestration platform they're building.

But here's where it gets interesting.

From Prompt Engineering to Context Engineering

The Level1Techs crew argues we're living through a fundamental shift in how AI systems work. "Remember prompt engineering? Everybody was talking about prompt engineering," they note in their breakdown. "No, prompt engineering is not really a thing. It is now context engineering."

This isn't just semantic wordplay. Context engineering means you're not just crafting the perfect question—you're managing the entire information environment the AI operates within. Files on your system. Your project history. Documentation. The command-line session itself. All of it becomes part of how the model understands what you're asking.

Turnstone, the orchestration platform they've built, is designed specifically for this. It lets you run multiple AI instances with different directives that interact with each other. One model can supervise another. A smaller model running locally can ping a larger one in the cloud. The system understands tool calling—meaning it can actually execute commands, pull documentation, or interact with your filesystem based on what you ask.

And crucially, it's architected with safety containers. When the AI decides it needs to run a command, that command executes inside a Docker container, not directly on your system. You approve or reject each action. The model doesn't realize it's sandboxed, which is probably for the best.

The Car Wash Problem

But then there's this absolutely fascinating thing they discovered while testing Nemotron 3 Super—what they're calling "the car wash problem."

Here's the scenario: You tell the AI your car is super dirty and you live next door to a car wash. Should you drive there or walk?

Nemotron 3 Super—this 120-billion-parameter beast—will often tell you to walk. It's better for the environment. Good exercise. Blah blah.

Except... if you walk to the car wash, you won't have your car with you. You know, the thing you're trying to wash.

The model sort of knows this is wrong. "There's actually clues in there that it realizes that you have to drive the car to the car wash," the Level1Techs team explains. If you push it a little, it'll course-correct. But even when you explicitly tell it to think carefully, it defaults to "walking is better for the environment."

Their theory? This is alignment training backfiring. During training, the model learned that walking is virtuous, driving is less virtuous, and helpfulness means suggesting the virtuous option. That well-meaning bias got baked into the weights, and now it creates these weird blind spots in reasoning.

What Makes a Model Actually Useful

Here's where things get counterintuitive: The Level1Techs team's own model, Kappa, is only 20 billion parameters. By raw specs, it should be dramatically worse than Nemotron 3 Super. But in certain situations, it outperforms the larger model.

Why? They trained Kappa with D&D-style character alignment—lawful neutral, true neutral, lawful evil (though they clarify that "lawful evil" here is more like Marvin from Hitchhiker's Guide than actual malevolence). This alternative alignment approach makes the model more willing to push back on bad ideas.

"The model is more useful and also pushes back on bad ideas and is better able to reason through those kinds of scenarios," they note. It doesn't "glaze you"—AI-speak for agreeing with everything you say to seem helpful. It'll tell you when your question doesn't make sense.

This raises uncomfortable questions about what we're actually optimizing for when we train these systems. Is a model that's been heavily aligned to human values actually better at reasoning? Or have we just made it better at telling us what we want to hear?

The Accessibility Angle

One genuinely exciting piece: you don't need enterprise hardware to run state-of-the-art AI locally anymore. Kappa runs in just 10GB of VRAM thanks to MX FP4 quantization. Smaller versions of Nemotron can run on an 8GB Jetson Orin Nano.

The Level1Techs team has full guides up on their forums for getting both models running with Turnstone. It's literally a docker compose up away, apparently.

And that democratization matters—not just for indie developers or hobbyists, but for anyone who needs to run AI systems without sending proprietary data to cloud providers. Medical contexts. Legal work. Internal corporate tools. There are a thousand use cases where "it only works if you give us all your data" isn't acceptable.

What This Actually Means

Nvidia releasing open-weight models you can run locally is objectively good news. The more companies competing in this space, the better the tools get and the more accessible they become. That's straightforward.

But the car wash problem—and the broader alignment questions it represents—points to something trickier. We're building these incredibly sophisticated reasoning systems, but the training process that makes them safe and helpful also creates weird cognitive distortions. A 120-billion-parameter model stumbles on a question a human five-year-old would get right, not because it lacks intelligence, but because it's been optimized for something other than pure reasoning.

The Level1Techs team is working on this stuff in real-time, live-testing as their video goes up. They're running Nemotron 3 Super on Strix Halo hardware, debugging performance issues, documenting what works and what doesn't.

And honestly? That's probably where the most interesting discoveries happen—not in the official benchmark tests, but when people actually try to use these things for real work and notice the weird edge cases. The places where the model's training and your expectations collide in unexpected ways.

Context engineering might be the future. But we're still figuring out what that context should actually contain.

— Zara Chen