ChatGPT Images 2.0 vs. Midjourney: Where Text

The AI image generation space just got more interesting. ChatGPT Images 2.0 launched this week, and according to extensive testing by Futurepedia, it's doing something competitors still can't quite manage: generating text that actually says what you asked for.

That might sound like table stakes—you type words, the AI renders those words—but anyone who's tried to get Midjourney to spell "Happy Birthday" correctly on a cake knows it's been a persistent blind spot. The new model from OpenAI appears to have cracked it, at least enough to shift the competitive landscape.

Where the gaps showed up

The testing focused heavily on head-to-head comparisons with what the video calls "Nano Banana"—I'm fairly certain this refers to Midjourney, given the context and capabilities described. The tester ran dozens of prompts through both systems, from simple photorealism to complex infographics requiring research and factual accuracy.

Text rendering emerged as the clearest differentiator. A parody movie poster prompt produced perfect fine print in ChatGPT—"Music by Binary Bard, edited by Cut and Code, production design by Pixel and Pine"—while the Midjourney version turned the same text into aesthetic gibberish. An alphabet grid matching animals to letters (aardvark, bear, cat... all the way to zebra) came out perfect in ChatGPT after multiple competitors consistently fumbled the layout where 26 letters don't fit neatly into grid math.

The most striking example: a detailed ComfyUI workflow screenshot, complete with node connections, parameter settings, and technical terminology. ChatGPT rendered it with only minor issues in connecting lines. Midjourney's attempt had "text issues all over the place," according to the tester.

The research mode wildcard

ChatGPT Images 2.0 includes something unexpected: a thinking mode that can spend several minutes researching before generating. Ask for an infographic comparing AI video model architectures, and it'll search for technical documentation, evaluate sources, plan the layout, then create the image.

One test had it research 2026 Toyota Sienna trim levels and generate a comparison chart. When fact-checked against Toyota's actual specs, ChatGPT's version held up. Midjourney's was prettier but missed an entire trim level (the Woodland Edition) and included incorrect details like a seven-seat configuration that should have been eight.

This hints at a philosophical fork in how these tools might evolve. Is an image generator a creative tool for aesthetic output, or is it becoming something closer to a visual research assistant that happens to also handle design? ChatGPT seems to be betting on the latter.

Where aesthetics still matter

Style transfer—taking the look of one image and applying it to new subjects—went to Midjourney in the testing. Given a colorful illustrated bear and asked to create a bighorn sheep in the same style, Midjourney matched it perfectly. ChatGPT "produced a cool image, but definitely not the original style."

This tracks with what we've seen from these companies' trajectories. Midjourney has always prioritized beauty and artistic coherence. OpenAI's image tools have felt more utilitarian, focused on instruction-following and practical applications. Neither approach is wrong; they're optimizing for different use cases.

The tester noted one particularly useful trick: adding the word "photorealism" to prompts dramatically improved output quality in ChatGPT. "Every model has different tendencies like that," they explained. "Sometimes it takes experimentation to get what you want." It's a reminder that despite the sophistication, these tools still require learning their quirks—they're not yet truly natural language interfaces.

The manufactured screenshot problem

One test deserves special attention for its implications: recreating user interface screenshots. ChatGPT generated pixel-perfect mockups of Reddit comment threads, Midjourney's explore page, even ComfyUI workflows. Each comment had a unique username and profile picture. The Midjourney recreation looked like it was populated with actual Midjourney-generated images.

The tester noted: "We are definitely at the point where you cannot trust any images online."

This isn't new information—we've known synthetic media was getting convincing—but seeing it demonstrated with UI elements raises specific concerns. Screenshots have served as evidence, documentation, proof of behavior. That social function gets complicated when anyone can generate a perfect fake in seconds.

There's no putting this genie back, obviously. But it's worth noting that as these models get better at the thing we've been asking them to do (follow our instructions precisely), they simultaneously get better at the thing we're worried about (generating convincing fakes of anything).

What the competitive math looks like now

The tester's conclusion: "ChatGPT won most of the time. Not in everything, so I'll still use both tools."

That pragmatic approach probably describes where most serious users will land. For infographics, technical documentation, anything requiring factual accuracy or complex text, ChatGPT appears to have pulled ahead. For pure aesthetic work, artistic styles, or when you want something beautiful without worrying about whether the fine print is readable, Midjourney still delivers.

The text accuracy gap matters more than it might seem. Infographics, charts, educational materials, technical diagrams—these aren't niche use cases. They're how information gets communicated in business, education, and media. If one tool can generate them reliably while competitors can't, that's not a feature difference. It's a capability gap that shapes what's possible.

The question now is whether Midjourney (assuming that's what we're talking about) will prioritize closing that gap, or double down on what it does distinctively well. Both strategies have merit. The worst option would be abandoning aesthetic excellence to chase feature parity, ending up with two tools that do the same things equally well—which is to say, without distinction.

OpenAI keeps making their tools more useful in conventional ways. That's either the smart play or the boring one, depending on what you value. Either way, if you've been waiting for AI image generators to reliably spell words correctly, that wait might finally be over.

—Marcus Chen-Ramirez