AI Pair Programming: Productivity Tool or

The most interesting thing Sam Anthony says in IBM's recent explainer on AI pair programming isn't in the pitch. It's buried near the end, almost as a footnote: "AI can be very confidently wrong, especially when it's not an expert in your business context."

That sentence is doing a lot of work. And if you're a developer writing code that touches authentication, payment flows, or anything a bad actor might want to get at — it should give you pause before you accept the next autocomplete suggestion.

I cover security, not developer productivity. But those two things aren't separate anymore, and that's the real story here.

What IBM is actually selling

The video frames AI pair programming as a natural evolution of human pair programming — two heads better than one, except now one head never sleeps, never needs to vent about the sprint planning meeting, and can generate test cases while you grab coffee. Anthony's pitch is genuinely coherent: AI handles the tedious parts of the development inner loop (context-switching, documentation, boilerplate), developers stay in the driver's seat on judgment calls, and the result is faster cycles with fewer blockers.

The productivity case is reasonable. The colleague-or-tool debate over how to classify these systems is interesting, but it's mostly academic when your deadline is Friday. What matters is whether the output is correct — and for security-sensitive code, correct means something more specific than "compiles and passes unit tests."

Human pair programming does have a research base behind it — Laurie Williams' foundational work suggested it catches bugs earlier and produces more maintainable code — but later meta-analyses have been more qualified, particularly for experienced developers. The evidence is real but not the slam dunk that productivity advocates tend to imply. AI pair programming inherits both the genuine benefits and the open questions.

The part the video skips

Here's what a developer productivity video from IBM is not going to spend time on: the security surface that AI-generated code creates.

When an AI coding assistant suggests an implementation, it's drawing on patterns from its training data. For well-documented tools like GitHub Copilot, that training includes large volumes of public code. Other tools — Cursor, Amazon Q, others — have less publicly documented training data mixes, so the precise sourcing varies. What doesn't vary is the underlying dynamic: the model has seen a lot of code, including a lot of insecure code, because insecure code is what most public repositories contain. The internet is not a curated security curriculum.

Research has found that AI-generated code can reproduce known vulnerable patterns — buffer overflows, SQL injection setups, insecure random number generation — with the same fluency it reproduces everything else. The model doesn't know it's doing this. It's not being malicious. It's completing patterns. That's the problem.

Anthony's warning about "confident wrongness" is framed as a productivity issue: don't blindly accept AI output because it might be logically incorrect. But confident wrongness in a CRUD app is a bug. Confident wrongness in an authentication module is a CVE. The stakes aren't the same, and treating them as the same category of risk is where development teams can get into trouble.

This connects directly to what Anthropic found in their own research on AI and developer skill — the story is more complicated than "AI makes you better." There are real questions about whether heavy reliance on AI-generated code erodes the diagnostic instincts that let experienced developers recognize a subtly broken implementation when they see one.

Skill atrophy is the actual argument

I want to spend a moment on what I think is the most honest tension in Anthony's framing, because he's right about it even if he undersells the implications.

"Less time is spent writing code from scratch," he says, "and more time is spent outlining problems, designing systems, and evaluating the quality of solutions."

That's the optimistic version of a real shift. The less optimistic version: if a generation of developers writes significantly less code from scratch, they may become less equipped to evaluate the quality of solutions — because that evaluation skill is built by writing and debugging code yourself, failing at it, understanding why it failed. You can't audit code you don't fully understand, and you can't develop deep understanding primarily through review.

This isn't a theoretical risk. It's how skill atrophy works in every technical domain. The spreadsheet analogy gets invoked a lot in these conversations — calculators didn't kill mathematicians, spreadsheets didn't kill accountants — but the spreadsheet comparison is actually messier than it looks. There's documented evidence that the introduction of spreadsheet software did reduce accounting clerk employment meaningfully through the 1980s and '90s. It's not a clean parallel for "skills shift, not job loss." The cleaner honest statement is: the role changes, some people adapt, some don't, and the skills that matter shift in ways that aren't always predictable in advance.

What to actually do with this

Anthony's bottom line is that active engagement is non-negotiable: "If you blindly accept everything AI produces, you're not really collaborating." Fair. But "don't be passive" is advice, not a framework. Here's what I'd actually want a developer to think through before leaning into AI pair programming at work:

What kind of code am I generating? The risk profile for AI-assisted boilerplate is very different from AI-assisted authentication logic or cryptographic implementation. Know which is which before you accept suggestions.

Does my review process account for the specific failure modes of AI-generated code? That means looking for overconfident implementations of security patterns, checking whether suggested dependencies have known vulnerabilities, and not assuming that "it looks right" means "it is right" — because AI output is optimized to look right.

What does my security team know about the tools we're using? This is a question many development teams aren't asking yet. If AI coding assistants are generating code that touches sensitive data flows, your security posture should account for that. The question of what data those tools send back to their servers is also worth asking explicitly — some tools have clearer data handling policies than others.

Am I building the skill or borrowing it? There's a difference between using AI to accelerate work you understand deeply enough to evaluate, and using it to produce work you couldn't produce or critique yourself. The first is a productivity tool. The second is a dependency.

None of this means AI pair programming is a bad idea. The productivity gains Anthony describes are real and the framing — AI accelerates the loop, humans make the calls — is the right one. But "humans make the calls" only works if the humans are equipped to make them. Right now, that's less a given than the IBM video implies.

The promise of AI pair programming is that it makes developers more capable of tackling bigger problems. The open question is whether it's building that capability or borrowing against it.

Rachel "Rach" Kovacs is Buzzrag's cybersecurity and privacy correspondent.