The AI Compute Crisis No One's Talking About

We built an economy that runs on AI and now we're discovering there isn't enough compute to actually run it. Not in 2027. Not eventually. Right now.

Here's the thing everyone's missing: this isn't a temporary supply chain hiccup that resolves itself when the next fab comes online. According to AI strategist Nate B Jones, we're looking at a structural crisis in global technology infrastructure that won't see relief until at least 2028—and possibly much later. The world reorganized itself around AI capabilities over the past three years, and those capabilities depend entirely on inference compute that is now physically constrained.

The numbers sound almost fake until you realize they're not.

When Your AI Bill Becomes Your Biggest Line Item

Let's talk token consumption, because this is where the math gets wild. A knowledge worker using AI aggressively right now—code completion, document analysis, research assistance, meeting summaries—consumes roughly a billion tokens annually. That's the baseline for heavy users at AI-forward companies.

Except the ceiling is actually 25 billion tokens per year. Per person. And we're heading there fast.

Jones breaks down what this means at enterprise scale, and honestly? It's the kind of calculation that makes CFOs start stress-eating. A 10,000-person organization consuming a billion tokens per worker spends about $20 million annually on inference at current API rates. Expensive but manageable for a Fortune 500.

At 10 billion tokens per worker? That's $200 million a year. At 100 billion tokens—which agentic systems could hit within 18 months—you're looking at a $2 billion compute bill.

"These calculations assume stable pricing and available capacity," Jones notes, "but really neither assumption holds."

And here's the part that should keep enterprise leaders up at night: humans have natural rate limits. We type at a certain speed, take breaks, attend meetings, go home. An agentic system running continuously—monitoring, analyzing, responding, planning—doesn't have those constraints. "A single agentic workflow can consume more tokens in an hour than a human generates in a month," Jones explains.

Google disclosed it processed 1.3 quadrillion tokens per month across its services—a 130-fold increase in just over a year. If the world's most sophisticated AI infrastructure operator is seeing that kind of growth, planning for anything less starts to look pretty optimistic.

The Memory Crisis That's About to Get Very Expensive

The supply situation is... let's just say it's not great! And by "not great" I mean "fundamentally broken in ways that can't be fixed quickly."

AI inference is memory-bound. The models you can run, the speed you run them at, and the number of concurrent users you can serve all depend on memory—specifically high-bandwidth memory (HBM) for data centers and DDR5 for everything else.

According to market research firm TrendForce, server DRAM prices rose at least 50% through 2025, with projections showing another 55-60% increase quarter-over-quarter in Q1 2026. Counterpoint Research backs this up, forecasting DRAM prices overall will rise approximately 47% in 2026 due to severe undersupply.

When you're talking triple-digit price increases over 18 months, this isn't your typical cyclical shortage. Three companies—Samsung, SK Hynix, and Micron—control 95% of global memory production, and they're all reallocating away from consumer products toward enterprise and AI data center customers. The hyperscalers are buying everything they produce.

HBM is even worse. SK Hynix dominates production, and their entire output is allocated to Nvidia, AMD, and hyperscalers years in advance. You literally cannot buy HBM at any price right now.

New capacity doesn't arrive quickly either. A new DRAM fab costs around $20 billion and takes three to four years to construct and ramp up. Decisions made today won't yield chips until roughly 2030. Samsung's president has stated publicly that memory shortages will affect industrywide pricing through 2026 and beyond—the world's largest memory manufacturer telling you on the record they cannot meet demand.

Your Cloud Provider Is Now Your Competitor

Here's the plot twist most enterprises haven't internalized yet: AWS, Azure, and Google Cloud aren't neutral infrastructure providers anymore. They're AI product companies that happen to sell infrastructure, and they compete directly with their enterprise customers.

Google uses its compute allocation to power Gemini. Microsoft uses its allocation for Copilot. Amazon uses its allocation for AWS AI services. When compute was abundant, this conflict of interest was manageable—hyperscalers could serve their own needs and sell excess capacity. When compute is scarce, the conflict becomes zero-sum.

"Every GPU allocated to an enterprise customer is a GPU not available for Gemini, Copilot or Alexa," Jones points out. "The hyperscalers must choose between their own products and their customers."

And they're already choosing. API pricing has fallen over the past two years, but rate limits have tightened. Enterprise customers are reporting increasing difficulty getting allocation commitments for high-volume deployments.

Jones is blunt about the strategic implication: "Enterprise CTOs need to internalize that the cloud providers are not going to be reliable partners in this crisis. They are competitors who control the scarce resource that you need."

The GPU Situation Is Somehow Even Worse

Nvidia dominates AI chips with roughly 80% market share. Their H100 and Blackwell GPUs are the standard for data center AI. Both are sold out, with lead times for large orders exceeding six months.

Microsoft, Google, Amazon, Meta, and Oracle have locked up allocation through multi-year purchase agreements worth hundreds of billions of dollars. Enterprise buyers get whatever remains, which is increasingly not much. Nvidia's newer H200 and Blackwell GPUs offering significant performance improvements? Those initial production runs are fully allocated to hyperscalers.

The alternatives aren't really alternatives. AMD's Instinct MI300X is competitive on specs and available in somewhat larger quantities, but the software ecosystem lags significantly. Intel's Gaudi accelerators have struggled to gain market share despite competitive pricing. Google's TPU and Amazon's Trainium are built for internal use and aren't available to enterprises.

Meanwhile, TSMC manufactures the world's most advanced chips, and their cutting-edge nodes are fully allocated. Their Arizona fab won't reach full production until 2028. New facilities in Japan and Germany are on similar timelines. Intel's 18A process represents the first credible American alternative, but their capacity is limited and Microsoft is absorbing their initial allocation.

Essentially all advanced AI chip production runs through TSMC in Taiwan. There is no surge capacity. There is no alternative.

Who Gets Crushed First

The impact of spiking inference costs depends heavily on your business model. AI-native startups are extremely exposed—companies like Notion have disclosed that AI costs now consume 10 percentage points of what was previously a 90% gross margin business. If inference costs double, many AI-native business models become unviable.

Enterprise software companies building AI features face similar pressure. AI is a competitive requirement, but every AI feature erodes margin. Traditional enterprise planning frameworks—assess requirements, procure infrastructure, depreciate over 3-5 years—assume predictable demand, stable technology, and available supply. None of that is true anymore.

Jones offers a concrete example: an enterprise purchases 1,000 AI workstations with NPU capabilities at $5,000 each—a $5 million capital investment with a four-year depreciation schedule. By year two, per-worker consumption has increased 10x and those NPUs can't handle agentic workflows consuming billions of tokens. The machines aren't broken; they're obsolete.

What do you do? Continue using inadequate hardware and watch competitors pull ahead? Purchase new hardware and take a write-down? Lease instead and pay a premium while transferring depreciation risk?

"This is not a tech problem," Jones emphasizes. "It's being presented as one, but that's incorrect. It's actually an economic transformation with consequences that will reshape competitive dynamics across every industry."

The companies securing compute allocation now—through long-term cloud commitments, direct GPU purchases, or alternative architecture strategies—aren't working from some brilliant playbook. They're making high-stakes bets with incomplete information in a market where waiting to see what happens means you've already lost. The window to act is closing, and the enterprises that hesitate will find themselves bidding against each other for scraps or locked out entirely.

We built the future on infrastructure that doesn't exist yet. The bill is coming due.

—Zara Chen