AI Coding Tools Are Slot Machines, Not Software

Jeremy Howard, the fast.ai founder who created ULMFiT—the foundational architecture that enabled modern language model fine-tuning—has a problem with how developers are using AI. Not with the technology itself, but with the illusion it creates.

"The thing about AI based coding is that it's like a slot machine," Howard told Machine Learning Street Talk in a conversation published this week. "You have an illusion of control. You could get to craft your prompt and your list of MCPs and your skills and whatever and then but in the end you pull the lever, right? Here's a piece of code that no one understands."

This isn't theoretical concern. Howard's team recently completed a study examining productivity gains from AI coding assistants. The results: "There's a tiny uptick, tiny uptick in what people are actually shipping." Not the 50x improvements vendors promise. Not even close.

The gap between claimed productivity gains and shipped software matters because it reveals something fundamental about what these tools actually do—and what they don't.

What Gets Lost in Translation

Howard's critique hinges on the distinction between coding and software engineering. AI assistants can generate syntactically correct code efficiently. What they cannot do is understand the system that code will inhabit, the edge cases it must handle, the performance constraints it must respect, or the maintenance burden it will create.

"They're really bad at software engineering," Howard said. "And then I think that's possibly always going to be true."

This isn't about whether language models understand code in some philosophical sense. It's about whether developers using these tools maintain the deep system knowledge required to evaluate what comes out. When you prompt an AI and receive hundreds of lines of code you didn't write, do you understand what it does well enough to ship it?

The answer increasingly seems to be: developers don't know, and that uncertainty is becoming normalized.

The Regularization Lesson

Howard's skepticism about AI-generated code comes from someone who helped create the very techniques that make modern language models possible. His 2017 ULMFiT paper demonstrated that models pre-trained on general text could be fine-tuned for specific tasks—the exact paradigm now powering ChatGPT and Claude.

The technical insight that made ULMFiT work was regularization: taking a maximally flexible model and constraining it through multiple techniques rather than reducing its size. This meant models could be as powerful as needed while remaining controllable.

Howard trained the original ULMFiT model overnight on a single gaming GPU—a 2080 Ti—using Wikipedia as the general corpus. The next morning, he fine-tuned it on movie reviews for an hour, then spent "a few minutes" training a classifier that beat years of specialized PhD research.

The lesson wasn't just that transfer learning worked. It was that understanding what you're doing—looking at activations, examining gradients, knowing when neurons are dying—matters more than throwing compute at problems. Howard's fast.ai software includes built-in tools to visualize entire networks at a glance. "Once you've done it a few times it just takes a couple of hours to learn you can immediately see oh I see this is overtrained or undertrained or at this layer that something went wrong. It's not a mystery."

That hands-on understanding is precisely what current AI coding tools discourage.

The Friction Problem

Modern development increasingly removes friction from the coding process. AI writes boilerplate. Copilot completes functions. Claude generates entire modules. Each removal of friction feels like productivity.

But Howard argues friction is where learning happens. "Whoever you listen to, you know, whether it be Feynman or whatever, like you always hear from the great scientists, how they build deeper intuition by building mental models which they get over time by interacting with the things that they're learning about."

This connects to research on "desirable difficulty" in learning—the principle that making tasks harder in specific ways improves long-term retention and understanding. When AI removes all difficulty from coding, it may be removing the very mechanism that builds expertise.

The consequence isn't just individual skill degradation. Howard points to research on organizational knowledge and what computer scientist John Ousterhout calls the "slope versus intercept" problem: systems that are easy to start with but hard to extend versus systems that require upfront investment but scale cleanly. AI-generated code optimizes for low intercept—fast starts—while potentially creating unmaintainable systems with terrible slope.

Interactive Understanding

Howard's preferred development environment is notebooks and REPL environments—tools that let developers manipulate objects in real time, study them, move them around, combine them. This isn't about tooling preference. It's about maintaining the interactive loop that builds intuition.

"The idea that a human can do a lot more with a computer when the human can like manipulate the objects inside that computer in real time and study them and move them around and combine them together," Howard explained. This approach directly opposes the "prompt once, receive solution" pattern of current AI coding assistants.

The policy question lurking beneath this technical debate: if AI coding tools are training a generation of developers who can't maintain or understand the systems they build, what happens when those systems fail? Who fixes critical infrastructure when the people nominally responsible for it learned to code by prompting?

The Nuance on Creativity

Howard's position on whether AI can be creative is more careful than his critics suggest. He draws a distinction between interpolative creativity—combining remembered elements in novel ways—and extrapolative creativity that moves outside the training distribution.

"LLMs are actually quite good at" interpolative creativity, Howard acknowledged. "But if it's like well can they really extrapolate outside the training distribution the answer is no they can't."

The problem is knowing which type of creativity a given task requires. If you're doing R&D work at the edge of what's known—Howard's daily work—you quickly hit the limits. If you're building standard web applications, maybe interpolation suffices. The current trend toward "vibe coding" assumes everything is interpolation.

This assumption has regulatory implications. If AI tools are genuinely making software engineering more accessible, that's democratization worth supporting. If they're creating a class of developers who can generate code but cannot engineer systems, we're building technical debt that will eventually require policy intervention—either through licensing requirements, liability frameworks, or mandatory human review of AI-generated code in critical systems.

The METR study Howard references—examining AI performance on operating system development tasks—suggests current models still struggle with complex software engineering even when given substantial scaffolding and experienced developers. The tools are improving, but they're improving at generating code, not at doing software engineering.

Howard's fundamental point stands: "No one's actually creating 50 times more high-quality software than they were before." Until the gap between generated code and shipped software closes significantly, the slot machine metaphor remains uncomfortably accurate. You can craft your prompt carefully, but you're still pulling a lever and hoping for results no one fully understands.

Samira Okonkwo-Barnes covers technology policy and regulation for Buzzrag.