Async Rust Performance: What Most Developers Get

Here's something that'll mess with your head: you can execute 500 HTTP requests simultaneously using only two operating system threads. Not in theory—in practice. And if you're writing async Rust without understanding how that works, you're leaving serious performance on the table.

Code to the Moon just dropped a video that walks through the mechanics of async Rust and the Tokio framework, and honestly? It made me rethink some assumptions I didn't even know I had. The creator starts with a deceptively simple example—two HTTP requests using Tokio—and methodically reveals why most developers' mental models of what's happening under the hood are just... wrong.

The Single-Threaded Surprise

The first gut-punch comes early. You've got two async requests. You're using the Tokio join macro to wait for both. They kick off without blocking each other. Your brain screams "parallelism!" and you assume Tokio's dispatching these to worker threads.

Nope. "This is a single-threaded application, and the requests are being executed concurrently," the video explains. "There's no parallelism happening yet."

Wait, what?

Turns out, when you await a future in Tokio, you're yielding control back to the scheduler. That await is essentially saying "hey, I'm waiting on something external, feel free to do other stuff with this thread." The scheduler then runs your second request on that same thread. Both requests are in flight, but you're juggling them on a single thread through cooperative multitasking.

This isn't a bug—it's the entire point. Concurrency without the overhead of spinning up multiple OS threads. For I/O-bound operations (like HTTP requests), this is actually ideal. You're not burning CPU cycles; you're waiting on network responses. One thread can handle dozens, hundreds, even thousands of these operations by switching between them when they're blocked.

When Blocking Operations Break Everything

But here's where it gets spicy. The video switches to a synchronous HTTP client (ureq instead of the async reqwest). Same code structure, same Tokio join macro. The result? Everything falls apart. Back to serial execution.

Why? Because synchronous blocking operations can't yield. They just... sit there, hogging the thread until they're done. Your careful async setup becomes meaningless.

This is where Tokio's spawn comes in. Wrap your async function in tokio::spawn() and you're actually dispatching work to Tokio's worker thread pool. By default, you get one worker thread per CPU core—a sensible default that the creator notes they've "never actually had to change."

With worker threads in play, even blocking operations can run in parallel. Two synchronous HTTP requests spawned as Tokio tasks will execute simultaneously on separate worker threads. Progress.

The Main Thread Plot Twist

Then comes the third reveal, and this one's genuinely clever. The video adds a third request—not spawned, just running on the main thread—alongside two spawned tasks using synchronous blocking calls. Only two worker threads configured.

Basic math says you can only handle two parallel operations, right? But all three requests complete simultaneously.

"The punchline here is that we still have the main thread to work with," the creator explains. Even with just two worker threads, you've got three threads total. The main thread isn't sitting idle while your worker threads do the heavy lifting—it's part of the workforce.

This creates an interesting dynamic. Add a fourth spawned task and you hit a bottleneck—three threads can only handle three blocking operations in parallel. That fourth request has to wait.

But switch back to async operations? All four requests kick off simultaneously. Because async operations yield when blocked, that three-thread limit stops mattering. You're back to concurrency magic.

The 500-Request Thought Experiment

The video escalates to a hypothetical: 500 Tokio tasks, each making an HTTP request. Still just two worker threads. With async operations, the claim is all 500 requests could launch and await responses concurrently.

"This avoids spinning up and tearing down operating system threads," the creator notes. "There is a lot of overhead in spinning up and tearing down operating system threads."

The creator wisely doesn't actually run this ("I don't want Hacker News to think that we're DDoS-ing them"), but the math checks out. Each await yields the thread. The scheduler keeps rotating through ready tasks. You're not thread-limited; you're only bound by memory and network capacity.

Switch those 500 requests to blocking operations, though? Now you're stuck. Two worker threads means two parallel operations max. The other 498 sit in a queue. Your async architecture just became a very expensive way to achieve worse-than-synchronous performance.

The spawn_blocking Escape Hatch

So what do you do when you genuinely need to run blocking or CPU-intensive operations in an async context? Tokio's answer is spawn_blocking.

Unlike spawn, which dispatches to the worker thread pool, spawn_blocking creates a brand new thread for each blocking operation. Up to 512 threads by default. "That's what you want so that you don't saturate your worker threads with blocking operations and block your task pipeline," the video explains.

It sounds wasteful—isn't the whole point of async to avoid thread overhead? But the alternative is worse. A few blocking operations clogging your worker threads can tank the performance of hundreds of async tasks. Better to quarantine the blocking work to dedicated threads and keep your async pipeline flowing.

What This Actually Means for Real Code

Here's what I find fascinating: async Rust's power comes from understanding what it's not doing as much as what it is. It's not creating threads for every operation. It's not even necessarily using all your CPU cores. It's optimizing for I/O-bound workloads where most of your time is spent waiting, not computing.

Mix in blocking operations without thinking, and you've just built an elaborate foot-gun. But use the right primitives—spawn for async tasks that might benefit from parallelism, spawn_blocking for unavoidable blocking work, and plain await for I/O-bound operations—and you've got something genuinely powerful.

The video's running example involves HTTP requests, which makes sense pedagogically. But the implications extend to database queries, file I/O, any operation where you're waiting on external systems. Get this model right and your application can handle massive concurrency without the resource overhead of traditional thread-per-request architectures.

Get it wrong—immediately await every future, block worker threads with synchronous operations, spawn tasks that don't need spawning—and you're just writing slow code with extra steps.

The gap between those two outcomes isn't subtle. And based on the "frequently misunderstood" framing in the video, a lot of developers are stumbling into the slow version without realizing it.

— Tyler Nakamura, Consumer Tech & Gadgets Correspondent