How Engineers Actually Know When Something Is Fixed

There's this fantasy version of debugging where you identify a problem, make a surgical change, run a test, and boom—fixed. Then there's the reality, which looks more like turning 80 knobs on a machine that belches black smoke, not knowing if you're getting closer or just making different mistakes.

Dave from Dave's Attic spent eight months teaching an AI to master Tempest, the 1981 arcade game. Then he started on Robotron. His recent Shop Talk episode pulls back the curtain on what debugging complex systems actually looks like when you're neck-deep in it—and spoiler, it's way less rigorous than you'd think.

The visual feedback trick that changes everything

When Dave's building tools fast, he doesn't always write formal tests. For his dashboard monitoring AI gameplay, he just... looks at it. "If it's working, you launch it and it looks right and it does the right things," he explains. "It's pretty easy to smoke test that just by looking at it."

But for the gnarly logic stuff—object detection, tracking—he's learned to build visual confirmation into the system itself. Boot up his Robotron AI and you'll see reticles drawn around every enemy. It's not just cool-looking; it's instant validation that the code knows where things actually are. "If I broke something and now all the rectangles are in the wrong place, it would be very obvious. I wouldn't have to check the numbers."

This is the kind of engineering wisdom that doesn't make it into textbooks: sometimes the best test is literally seeing if the thing looks wrong.

When "it's working" doesn't mean it's fixed

Here's where it gets interesting. Dave's AI will suddenly start performing better—scoring higher, surviving longer. Is it actually learning, or just getting lucky? The answer requires watching specific metrics over time, and he's particular about which ones matter.

For Robotron, he tracks average life length in game frames (around 700 frames or 12 seconds early on), average score per game, average reward per life, and average level reached. "If I see that graph ticking up over time, that's good," Dave says. "If it's going down, then I'm like, what's going on with it?"

But ticking up isn't enough. He's looking for trends—patterns that persist when smoothed out over a million frames. His dueling Q-network (DQN) score should trend upward predictably. The loss function should trend downward. These signals tell him the AI is actually learning, not just having a good run.

The patience required here is brutal. Imagine adjusting parameters, letting the system run for hours, seeing no improvement, and having to decide: is this approach fundamentally wrong, or do I just need to wait longer?

The 80-knob problem

"I just imagine you've got a machine with 80 knobs and they all have to be set right for the thing to work and run," Dave explains, describing his Tempest debugging process. "You turn a knob and you try to fire it up and blleh, nothing. And you turn another knob and it runs and stumbles and dies."

The killer is this: by the time you're adjusting the third knob, you have no idea if the first knob was close to correct. You're not getting feedback on individual changes—you're getting feedback on the entire configuration. So you thrash. You make wholesale changes. You add 20 parameters and remove 30. The system improves, then plateaus, then regresses.

This is the part that surprised Dave's co-host Glenn, who assumed a "guru of computing and coding" would have rigorous processes. Nope. Dave was just going in and changing stuff, running it for a while, seeing what happened. The rigor came later, once he had something that kind of worked.

The AI-assisted workflow nobody talks about

Dave's current process involves Visual Studio Code, two shell windows (one for game sessions, one for the Python server), and something fascinating: he pits AI coding assistants against each other.

He'll ask Claude Opus to add a feature—say, tracking a moving average in the metrics. Claude does it. But instead of just running the code, Dave takes it to Cursor's Codex and says, "Hey, review this for me."

Codex either confirms it's good or points out missing edge cases. Then Dave makes a choice: let Codex fix the bugs it found, or go back to Claude with Codex's feedback? He usually picks the latter, figuring Claude has more context about its original changes.

Is this actually better than having one AI do everything? Dave doesn't know for sure—"that could be a myth, it could be imaginary"—but it feels right. And sometimes engineering judgment is just formalized intuition.

The git branch chaos

Here's a confession that'll make some developers wince: Dave ended up with eight abandoned branches, each containing failed experiments but also useful improvements he wanted to keep. "I added some columns to the metrics table and I would really like those and a hotkey and a menu that I added. So now I've got changes in here that I don't want all of."

There's a git feature for cherry-picking specific changes across branches. Dave doesn't know how to use it. So he asks Codex: "Go into this branch and find the code that does this and take it and then import that code into my current branch."

And it works. "Surprisingly good actually."

This is engineering in 2024: you don't need to master git's entire feature set if you can describe what you want to an AI that does.

What "fixed" actually means

So when do you actually know something's fixed? Dave's answer, earned through months of work: when multiple independent signals confirm the change, when the improvement persists over time, and when you understand why it's working, not just that it is.

For Robotron, he recently reworked the entire model—it was based on his Tempest approach (enemies in lanes), which didn't translate well to Robotron's 2D space. After the rework, the AI scored 1.5 million. Way better. But it plateaued again, just at a higher level. Not solved, but measurably improved.

That's the distinction experienced engineers learn to recognize: "working better" versus "actually working." And knowing which one you've got requires instruments, patience, and the willingness to admit when you're just turning knobs in the dark.

— Zara Chen, Tech & Politics Correspondent