PewDiePie Tried to Train an AI Model and Made It

PewDiePie—yes, that PewDiePie—spent months training an AI coding model. He nearly burned down his house twice. He made the model worse. Then he made it even worse. And somehow, watching someone with zero machine learning background document this entire mess might be the most honest glimpse into AI development we've gotten all year.

In a video reacted to by streamer ThePrimeTime, Felix Kjellberg (PewDiePie's actual name) walks through his attempt to fine-tune Qwen 32B, a Chinese open-source coding model, with the goal of beating ChatGPT-4 on a specific benchmark. The journey is equal parts educational and cautionary tale—a reminder that beneath all the hype about AI, there's still just a lot of extremely finicky data plumbing.

The Setup: Naïve Ambition Meets Open Source Reality

PewDiePie's pitch was straightforward enough: take an existing model that's already good at coding and make it better at a specific task. His target was the Aider Polyglot benchmark, which tests coding ability across six programming languages. ChatGPT-4 scored 18.2% on this benchmark. The base Qwen model he started with? 8%.

But there was a catch. If you changed the output format from "diff" (showing only what changed) to "whole" (regenerating the entire file), the same model jumped to 16%. As Pewdiepie explained: "It's basically like this. You draw a picture. Okay, imagine you draw a picture and you want to add a cloud. But instead of just adding the [expletive] cloud, you redraw the entire picture with the cloud. It makes no [expletive] sense."

So the plan: fix the format issue, add better training data, beat ChatGPT. Simple.

Except nothing about AI training is simple, and Pewdiepie was about to learn this the expensive way—both in GPU compute hours and in his own sanity.

The Data Problem: Garbage In, Worse Garbage Out

To train an AI model, you need data. Lots of it. Pewdiepie explored every option: mining The Stack (a 60TB dataset of code), scraping GitHub repos with MIT licenses, using publicly available datasets, and—most ambitiously—generating synthetic data.

Synthetic data is where you take a strong AI model and ask it to create more training examples in your desired format. "You get the perfect data exactly the way you want it. It's amazing," Pewdiepie said. "But the problem is, and maybe you already know this, AI is wrong all the [expletive] time."

He illustrated the problem with a burger analogy: you show the AI a burger, it generates what looks like a burger, but when you open it up, there are razor blades inside. So he built a "test harness"—a validation system to check if the synthetic data was actually correct.

This is where things started going sideways. He kept finding problems with his test harness, so he'd fix it. Then he'd discover his "fix" just allowed more garbage data through. He describes scraping data, enriching data, testing data, augmenting data, managing eight different LLMs simultaneously—"This project was kind of like I was in the middle of a freeway and there's a bunch of cars that I had to direct constantly."

The First Training Run: Achievement Unlocked—Made It Worse

After months of data preparation, Pewdiepie finally trained his model. He ran the benchmark, excited to see his improvements.

The model performed worse than the base model.

"I had made it worse. I probably should have quit then, but I am way too stubborn for that."

The problem? His data was full of issues he hadn't caught. When he'd "fixed" his validation harness, he'd actually just lowered the bar for what passed through. There were also "all these other issues like empty white spaces and classic coding issues that I just wasn't aware about."

So he tried again. Cleaned the data. Fixed the real problems this time. Locked in.

"The model is worse."

Not just worse than his goal—worse than his previous worse attempt. He'd somehow made the model significantly worse.

As ThePrimeTime reacting to this moment noted: "Oh, the brother locked in and actually made it significantly worse. Oh, that has to hurt a little bit."

What This Actually Teaches Us About AI

Here's the thing: Pewdiepie's failure is more illuminating than most AI success stories you'll read. The standard narrative is that AI training is complex but fundamentally tractable—you get good data, you train your model, it gets better. Rinse, repeat.

The reality is that it's more like debugging code you didn't write, in a language you're still learning, where the error messages are probabilistic and the stack traces don't exist. You can do everything "right" and still make things worse because you didn't understand one subtle interaction in your data pipeline.

Pewdiepie's experience also highlights something that doesn't get enough attention in AI discourse: the resource inequality. He references Chinese AI research papers that casually mention training on 2,048 GPUs while emphasizing this is "an economical approach." The joke is dark but accurate—when $60 million in compute is the economical option, what does that say about who gets to participate in AI development?

ThePrimeTime's commentary adds another layer. As a software engineer turned content creator, he can translate Pewdiepie's experience for a technical audience—but even he's surprised by some of the choices. When Pewdiepie gets stuck on synthetic data generation despite it clearly not working, ThePrimeTime asks: "Why not just GitHub MIT that crap?" Why not just scrape MIT-licensed code and call it a day?

The answer seems to be that Pewdiepie wanted synthetic data to work. Not because it was the best approach, but because he'd invested in understanding it. This is deeply human and also deeply at odds with how we typically frame AI development—as a purely rational, optimization-focused discipline.

The Benchmark Ceiling

Eventually, Pewdiepie got his model to hit 16% on the benchmark—sometimes 15%, sometimes 14%, but 16% was the ceiling. Which, notably, was exactly the ceiling for the base model when using the "whole" format.

He'd fixed the format issue. He hadn't made the model smarter. He was still 2 percentage points away from beating ChatGPT-4, and months of work had produced... a different way to get the same result the base model already achieved.

In a way, whether he succeeded is less interesting than watching the process—the false starts, the stubborn commitment to approaches that weren't working, the gradual education in how AI actually behaves versus how you think it should behave.

Pewdiepie's instinct early in the video turns out to be more profound than he probably intended: "Instead of focusing on gaming too much lately, I just been focusing on leveling up myself. But it's true. It's such an amazing feeling and I want you guys to experience it as well."

ThePrimeTime responds: "I can't believe Pewds is a better modern philosopher than most of tech Twitter."

Maybe that's the real story here. While AI researchers optimize benchmarks and tech executives promise AGI, a YouTuber with no machine learning background is documenting what it actually feels like to learn this stuff from scratch—the confusion, the setbacks, the slow accumulation of intuition about systems that don't behave the way you expect.

That's not the narrative that raises venture capital or gets cited in research papers. But it might be closer to how most people will actually experience AI development if the tools ever truly democratize—less "move fast and break things," more "move slowly and make things worse before maybe eventually making them slightly better."

—Marcus Chen-Ramirez