AI Coding Tools Might Freeze Dev Progress—Or Not

Here's a problem nobody saw coming: AI coding assistants might be really good at locking us into whatever frameworks and languages exist right now. Not because anyone designed them that way, but because of how these models fundamentally work.

Theo from t3.gg got to ask Sam Altman about this directly during a recent OpenAI livestream. The question was straightforward: are we building foundations that will be harder to swap later? "Even trying to get the current models to use the update to a technology that happened 2 years ago can feel like you're pulling teeth," Theo told Altman. "Do you think we'll be able to steer the models enough to get them to use new things or are we just done improving the technologies we build on now?"

Altman's answer was optimistic but vague in the way CEO answers often are. "I think we really will be very good at getting the models to use new things," he said. "A milestone that we will be very proud of is when the model can be presented with something totally new, new environment, new tools, new technology, whatever. And you can explain it once... and then just super reliably use that and get it right. And that doesn't feel very far away."

That last bit—"doesn't feel very far away"—is doing a lot of work. Because when you look at how these models actually operate, the problem isn't just about training them on new data. It's architectural.

The Compiler Problem

Theo has a useful metaphor for this: current AI models are more like compilers than runtimes. Once code is compiled, its capabilities are cemented. You can't add new functionality—you can only work with what was baked in. Runtimes, by contrast, can accept new code and execute it.

AI models, once trained, have frozen capabilities. They can't learn in the traditional sense. You can steer them with context—feeding them examples and documentation—but you're not teaching them. You're working around the edges of what they already know.

The reason React works so well with current AI tools isn't mysterious. The syntax is close to both JavaScript and HTML. Components are encapsulated in ways that let models edit one without understanding the others. And there's a decade of React code scattered across the internet for these models to have trained on. The framework's consistency over time means the training data stays relevant.

But imagine you built a new framework—call it T3act—with different syntax. Instead of angle brackets, you use colons. The model's tokenization, the way it breaks code into chunks it can understand, is now fighting you. The mathematical weights pointing toward likely next tokens are all calibrated for the old syntax. You can provide context to steer it, but now you're burning tokens on translation instead of problem-solving.

Tokenization Is Politics

This gets technical fast, but it's worth understanding because it shows why this isn't just a "train on more data" problem.

When a model sees <div>hello, it doesn't see characters. It sees tokens—chunks that its training process determined were meaningful units. OpenAI has put real engineering work into making sure a close bracket > stays as one token, because breaking it apart makes the model more likely to guess wrong about what comes next. GPT-5 keeps HTML elements together much better than GPT-3 did, which matters for code generation.

But this tokenization is optimized for existing syntax patterns. New patterns break it. "If a model has a certain level of intelligence based on the weights that it has today, adding more context to point in a different way means that that intelligence is being overridden some amount," Theo explains. The more you steer, the less of the model's trained capability you're actually using.

And there's a cost ceiling. If you need 50,000 tokens of context just to teach the model your new framework before you can ask it to do anything useful, you've made the model exponentially more expensive to run and given it exponentially more to keep track of. The previous optimization work becomes less relevant.

The Skills Experiment

Vercel recently tested two approaches for teaching AI agents about Next.js 16 features that weren't in training data. One approach embedded documentation directly in the agent's context (agent.md). The other used "skills"—a pull system where the model could choose which documentation to load.

The always-present documentation got 100% success rates. The skills system maxed out at 79%—and that was only when they explicitly told the model to use the skills. Without explicit instructions, skills didn't help at all.

This isn't encouraging for the "models will just learn new things" hypothesis. Even when given the documentation, models need to be explicitly directed to use it. And forcing everything into context isn't scalable—you hit token limits, costs explode, and the model's baseline intelligence gets diluted.

Theo experienced this firsthand with Claude's Kimmy K2.5 model and Tailwind v4. The model got so confused by the missing config.js file (removed in v4) that it eventually gave up and ported the entire project back to Tailwind 3 just to work with something it understood.

What Changes, What Doesn't

Altman's right that models will get better at working with new things. Mixture of experts architectures, better context handling, smarter reasoning loops—these will help. The ratio of user-provided tokens to model-generated ones is already shifting dramatically with reasoning models. Where it used to be roughly 50-50, you can now type "fix it" with a screenshot and the model will pull relevant code, reason through solutions, and generate fixes with minimal input.

But there's a difference between "getting better at working around this" and "solving the fundamental problem." The fundamental problem is that these models don't learn—they're massive autocomplete engines with frozen knowledge, steerable through context manipulation.

That might be fine. After all, humans don't continuously retrain our neural networks either—we learn through exposure and practice, which is closer to what context-steering attempts to simulate. The question is whether there's a ceiling to how well that simulation works, and whether we'll hit it before AI coding tools become truly general-purpose.

What's certain is that frameworks and languages optimized for AI tools—modular, consistent, well-documented, with syntax that tokenizes cleanly—will have an enormous advantage. Which means the market pressure isn't neutral. We're not just using AI to code; we're potentially reshaping what coding looks like to accommodate AI's limitations.

That's not necessarily bad, but it's worth being clear-eyed about. The tools are shaping the work, which is older than programming—but it's happening faster than usual, and the shape is less visible because it's statistical rather than syntactic.

Marcus Chen-Ramirez is a senior technology correspondent for Buzzrag, covering AI, software development, and the intersection of technology and society.