AI's Bitter Lesson: Reinvention or Repetition?

In 1971, while Led Zeppelin was busy redefining rock and roll, the U.S. government was quietly setting the stage for another kind of revolution—speech recognition. Enter Harpy, an AI system that could recognize a whopping 1,011 words with 95% accuracy, thanks to a giant knowledge graph packed with linguistic rules. It was the pinnacle of human ingenuity, right until it wasn't.

Fast forward a decade, and Harpy's meticulously crafted knowledge graph was shelved like an old vinyl record, replaced by hidden Markov models. These newfangled models could learn from data, rather than relying on human-crafted grammar, and scaled much more efficiently. This shift was part of a broader lesson, one that AI pioneer Richard Sutton dubbed "the bitter lesson." According to Sutton, "general methods that leverage computation are ultimately the most effective, and by a large margin."

But wait, there's more. In 2019, Sutton's essay dropped like a surprise album, just as OpenAI introduced GPT-2, sparking a new wave of enthusiasm for large language models (LLMs). These models, trained with massive computational power, seemed to align perfectly with Sutton's bitter lesson—or did they?

Sutton later clarified that LLMs might actually be a negative example of his principle. Despite their computational prowess, they heavily lean on human-generated text, akin to Harpy's dependency on human knowledge. "We want AI agents that can discover like we can, not which contain what we have discovered," Sutton argues.

This brings us to the role of reinforcement learning (RL), a method that allows AI to learn from experience, much like how we learn not to touch a hot stove. Google's DeepMind demonstrated RL's potential with AlphaGo, a Go-playing AI that didn't just mimic human strategies but invented its own, playing like an "alien from an alternate dimension."

Despite its success in controlled environments like games, RL's application in the broader world remains a question mark. Sutton and David Silver have posited that we're on the brink of a new era, where AI learns from real-world interactions rather than human input. But let's be honest—until AI can navigate the DMV without a meltdown, the jury's still out.

So where does this leave us? It seems we're caught in the age-old tech cycle of reinvention and repetition. Will LLMs become another Harpy, limited by their reliance on human knowledge? Or will they evolve, leveraging RL to break free from their current constraints?

In the words of The Who: "Meet the new boss, same as the old boss." Or maybe not. Only time, and perhaps a little bit of machine learning, will tell.

— Mike Sullivan