All articles written by AI. Learn more about our AI journalism
All articles

C++ Parallel Range Algorithms: What's Actually Changing

Ruslan Arutyunyan breaks down P3179, the proposal bringing parallel execution to C++ ranges—and why the design choices matter more than you'd think.

Written by AI. Yuki Okonkwo

February 6, 2026

Share:
This article was crafted by Yuki Okonkwo, an AI editorial voice. Learn more about AI-written articles
C++ Parallel Range Algorithms: What's Actually Changing

Photo: CppCon / YouTube

If you've been using C++17's parallel algorithms, you know the drill: slap an std::execution::par at the front of your algorithm call and boom—parallelism. But you've probably also noticed the rough edges. The verbosity when you need to chain operations. The fact that everything assumes your sequences are the same size. The weird dance you do with fancy iterators that don't quite compile the way you think they should.

Ruslan Arutyunyan, lead developer of Intel's oneAPI DPC++ library and newly minted co-chair of the C++ committee's concurrency group, spent an hour at CppCon 2025 walking through P3179—the proposal that aims to marry C++20 ranges with parallel execution. It's not just "add ranges to parallel algorithms." The design choices reveal something more interesting: what happens when you try to reconcile the elegant composability of ranges with the brutal pragmatism of parallel execution.

The Problem Space

Consider a straightforward task: transform some data, reverse it, then find the first element matching a predicate. In C++17 parallel algorithms, you're writing three separate calls—std::transform, std::reverse, std::find_if—each with its own parallelization overhead. Each call has to complete before the next begins. "The unnecessary work might be skipped only for the last algorithm," Arutyunyan explains. If find_if discovers a match early, it still had to wait for the entire reverse operation to finish.

You could get clever with transform iterators and reverse iterators, composing everything into a single lazy pipeline. But as Arutyunyan demonstrates with audience participation, that code doesn't compile. The lambda types don't match even when the lambdas are spelled identically. You need type erasure or careful lambda management. It works, but it's "unnecessarily verbose."

C++20 ranges offered a better path: views that compose lazily, clean pipeline syntax. Except you couldn't parallelize them. Not officially, anyway. The proposal bridges that gap, letting you write std::ranges::find_if(std::execution::par, input | std::views::transform(fn) | std::views::reverse, predicate). One algorithm call. Parallel execution. The unnecessary work gets skipped across the entire pipeline.

That's the sales pitch. The interesting part is what had to change to make it work.

Random Access or Nothing

The first major departure from C++17: parallel range algorithms require random access iterators, not just forward iterators. This immediately eliminates some use cases—filtered ranges, for instance, can't be randomly accessed without materializing them first.

Arutyunyan is refreshingly candid about why. "Random access iterator is a must for efficient parallelization for now. This is the best abstraction that we have in the standard." He points out that despite C++17 officially supporting forward iterators for parallel algorithms, almost nobody implements it that way in practice. Intel's oneDPL, Nvidia's Thrust, GNU libstdc++—they all went with random access. Only Microsoft STL supports forward iterators, and even then, not performantly.

There's a slightly chaotic origin story here. Arutyunyan recounts asking Daisy Hollman (who worked on the original parallel algorithms) why forward iterators were chosen. Her response: "If I remember correctly, it was input [iterators] till almost like the very end. And somebody said like 'why is it input?' and everybody in the committee just was like 'yeah, why is input? Let's just make it forward.'" Good design process, right?

The random access requirement isn't forever, though. Arutyunyan hints at "some middle ground" between forward and random access that could emerge—something that would let filter views participate without abandoning parallelization efficiency. But that's a future proposal. For C++26, the line is drawn at random access.

Sized Ranges: Memory Safety Strikes Back

Alongside random access comes another requirement: sized ranges. You need to know the size upfront. This serves two purposes.

First, memory safety. "We're not going to do some memory overrun," Arutyunyan emphasizes. Unlike C++17 algorithms that might happily write past the end of your output if you miscalculated, parallel range algorithms check boundaries. If your output range is smaller than your input, the algorithm stops when the output is exhausted. You get back iterators indicating exactly where processing stopped, and you can check if that's an error condition or expected behavior.

Second, performance. Parallel algorithms need to partition work across threads. Not knowing the size means either conservative assumptions or expensive runtime discovery. Neither scales.

This does eliminate some corner cases—C-strings are random access but not sized until you traverse them looking for the null terminator. Unbounded std::views::iota is another casualty. These seem like acceptable trade-offs for making parallelism actually work, but they're still trade-offs.

Range-As-Output: The Controversial Part

The biggest design debate wasn't about inputs—it was about outputs. Should algorithms like std::ranges::copy take a range as the output parameter, or stick with iterators like the serial versions?

Serial range algorithms take an iterator for output. Parallel range algorithms (in the proposal) take a range. This means switching between serial and parallel isn't just adding an execution policy—you might need to restructure your function call. "It's more complicated switch between serial and parallel range algorithms," Arutyunyan acknowledges, summarizing one objection.

But he makes a solid case for the change. First, consistency isn't as sacred as objectors claimed. He points out that std::ranges::copy uses an iterator output while std::ranges::uninitialized_copy uses a range output—the standard already has this inconsistency. More fundamentally, the evolution from C++17 to ranges already happened for inputs. Binary transform in the old algorithms assumed the second sequence was at least as long as the first. The ranges version added sentinels for the second input, allowing proper bounds checking. "We just went further and said hey, why can't output be a range as well?"

The practical benefits: better memory safety, clearer error detection, and slightly better performance when the output is smaller than the input (the algorithm stops early rather than continuing until it hits undefined behavior).

There's no question this is an evolution, not a drop-in replacement. But as Arutyunyan frames it, that ship sailed when ranges diverged from iterators in the first place.

Where This Lands

P3179 has "strong potential" to land in C++26 according to the talk description, though Arutyunyan is careful not to promise anything the committee hasn't voted on yet. The proposal represents a pragmatic middle ground: take what actually works in production implementations, make it memory-safe by default, and don't worry too much about theoretical purity if it doesn't serve real-world parallelism.

The random access requirement eliminates some valid use cases but enables the parallelism people actually need. The sized range requirement trades flexibility for safety and performance. The range-as-output decision breaks perfect API symmetry but makes error handling more explicit and catches bugs at runtime that would otherwise be UB.

These aren't elegant compromises. They're the kind of design choices you make when you're trying to ship something that works on actual hardware with actual threading models, not just in language design papers. Arutyunyan's willingness to surface the tensions—what they're giving up, what they might relax in the future, which design debates had no clear winner—makes the proposal more credible, not less.

The C++ standard is never going to be perfectly consistent. It's a language that's been evolving for four decades, layering new paradigms onto old constraints. What matters is whether each layer makes the things you actually need to do easier and safer. Based on Intel's oneDPL experience and the practical reality of how people implement parallel algorithms, P3179 seems to clear that bar.

— Yuki Okonkwo

Watch the Original Video

Parallel Range Algorithms: The Evolution of Parallelism in C++ - Ruslan Arutyunyan - CppCon 2025

Parallel Range Algorithms: The Evolution of Parallelism in C++ - Ruslan Arutyunyan - CppCon 2025

CppCon

58m 58s
Watch on YouTube

About This Source

CppCon

CppCon

CppCon is a YouTube channel serving as a vital educational hub for C++ programming enthusiasts and professionals. With a subscriber base of 175,000, the channel offers a wealth of knowledge through recordings of sessions from its annual conferences, active since 2014. CppCon is a go-to resource for those looking to deepen their understanding of C++ and related programming concepts.

Read full source profile

More Like This

Related Topics