AI Models Are Now Building Their Next Versions

The transition happened quietly, without fanfare. While the industry debated whether artificial general intelligence was two years or twenty years away, the major AI labs crossed a different threshold: their models began participating in their own development.

This isn't speculation. It's in their documentation.

Minimax, a Chinese AI lab, released technical details about their M2.7 model in April. Buried in the announcement was this: "M2.7 is our first model deeply participating in its own evolution." The model updates its own memory, builds skills for reinforcement learning experiments, and improves its learning process based on results. According to Minimax, the system now handles 30 to 50 percent of the research workflow autonomously.

The process works like this: a researcher discusses an experimental idea with the AI agent. The agent conducts literature review, designs experiment specifications, pipelines data, and launches experiments. It writes code, runs tests, analyzes results, and reports back. The human reviews, provides direction, and the loop continues. What once required teams of specialized engineers now runs with decreasing human involvement.

OpenAI said the same thing, more directly. When they released GPT-5.3 Codex, the announcement noted: "GPT 5.3 Codex is our first model that was instrumental in creating itself. The Codex team used early versions to debug its own training, manage its deployment, and diagnose test results and evaluations."

Not just helping build the next model—helping build itself. Early checkpoints of Codex optimized later checkpoints of the same model.

The Pattern Across Labs

Anthropic won't say it explicitly. That's their style—cautious, measured, allergic to hype. But their strategy reveals the same trajectory. Since launch, Claude has focused relentlessly on coding capabilities. Why? The obvious answer is revenue—every engineering team on Earth is buying AI coding assistance. The less obvious answer is infrastructure.

When your AI excels at research and coding, it builds the tools to improve itself. Development tooling, infrastructure management, deployment systems—all the scaffolding that makes it possible to train and serve models at scale. An internal document from Anthropic, dated July 2024, described "autonomous loops where Claude Code writes the code for a new feature, runs tests, and iterates continuously."

If you've watched Anthropic's release velocity in recent months, you've seen the result. They're shipping faster than anyone in the industry.

Google entered this territory even earlier. AlphaEvolve, released in June 2024, improved Google's system-wide architecture in ways that saved billions of dollars. The model discovered faster matrix multiplication—the first improvement to that fundamental operation in fifty years. Every AI model trained after that discovery is faster as a result. That's recursive improvement at the infrastructure level.

What This Actually Means

Sam Altman stated OpenAI's timeline in October: an AI research intern by September 2026, running on hundreds of thousands of GPUs. A full AI researcher by March 2028. That March date felt oddly precise at the time. Five months later, it looks conservative.

The core question isn't whether AI will participate in its own development. That's happening. The question is how fast the loop tightens.

Right now, these systems require substantial human guidance. Researchers design experiment directions, review results, make strategic decisions. The AI handles execution, analysis, iteration within those boundaries. That split—30 to 50 percent autonomous, per Minimax's numbers—is the current state.

But that percentage is the entire story. Six months ago, it was lower. Six months from now, it will be higher. The trajectory matters more than the snapshot.

Andre Karpathy, former AI lead at Tesla and OpenAI, recently open-sourced a project called AutoResearch. It's a framework for autonomous AI experimentation that individual developers can run. Point it at a problem—say, optimizing training for a GPT-2-scale model—and it designs experiments, runs them, analyzes results, and iterates. Karpathy reported achieving the fastest training time for such models on record after a single night of autonomous experimentation.

Matthew Berman, a developer and AI researcher, described his own implementation. He uses frontier models to design fine-tuning experiments for smaller, specialized models. The system runs overnight, testing whether fine-tuned local models can replace expensive API calls to services like Claude. When an experiment fails, the AI analyzes why, generates new hypotheses, adjusts parameters, and tries again. No machine learning background required—just the ability to direct the system toward a goal.

The Uncomfortable Parts

Leopold Aschenbrenner, formerly of OpenAI, wrote a paper after leaving the company arguing this recursive loop would arrive faster than consensus predicted. He included a graph: a flat line representing current progress, then an exponential curve representing what happens when AI researchers are removed from the bottleneck. His argument was that we're standing at the base of that curve right now.

The evidence suggests he was right about the timing. Whether he's right about the shape of the curve is unknowable until it happens.

What we can observe: the gap between "AI assists research" and "AI conducts research" is narrowing in measurable ways. The percentage of autonomous workflow increases. The iteration time decreases. The expertise required to participate drops—Berman's experience demonstrates that clearly.

The labs are clear-eyed about this. Minimax's documentation describes a "cycle of model self-evolution." OpenAI's Altman has stated publicly that autonomous AI researchers are the "core thrust" of their research program. These aren't side projects. They're the main effort.

What Changes

If AI development accelerates significantly—not guaranteed, but increasingly plausible—the constraint shifts from researcher availability to compute availability. There's substantial compute available. The hyperscalers are building data centers at unprecedented scale. The question becomes whether that compute translates to capability gains, or whether we hit other bottlenecks: data quality, architectural limitations, diminishing returns.

We don't know. We're watching it unfold in real time, with incomplete information and motivated actors on all sides. The labs have incentives to talk up their capabilities. Skeptics have incentives to dismiss progress as hype. The truth is somewhere in the observable behavior: what these systems can actually do, measured against what they could do six months ago.

What they can do now is participate in their own improvement at a level that was theoretical speculation two years ago. That's the data point. What happens next depends on whether that participation accelerates capability gains or encounters natural limits.

The recursive loop has started. How many iterations it runs, and how much each iteration matters, remains the only question that actually counts.

Bob Reynolds is Senior Technology Correspondent for Buzzrag.