Google's Imagen 2 Fills the Gap Between Cheap and Good

Google just released what might be the Goldilocks of AI image generation models, and honestly, the timing feels deliberate.

The new model—officially called Gemini 3.1 Flash but nicknamed "Imagen 2" by the community because naming conventions in AI are apparently suggestions—slots right between Google's budget option and their premium offering. It's cheaper than Imagen Pro, better than the original Flash model, and apparently good enough at rendering text that it might actually be useful for real work.

Which raises an interesting question: when does "almost as good" become good enough?

The Text Thing Actually Matters

AI researcher Sam Witteveen tested the model with a prompt that sounds ridiculous until you think about how often text-in-images breaks: a cat holding a sign saying "on strike, not catching mice today."

The model nailed it. Then he asked it to add toy mice to the scene. Nailed that too. Then he asked it to make the mouse real and holding a sign saying "I support the cats." Still worked.

"One of the things that you will find is that this model is extremely good at text," Witteveen noted in his walkthrough. "So if you look at it, it gets all the text there fine."

This might seem like table stakes, but text rendering has been AI image generation's awkward phase for years. Models that could paint photorealistic landscapes would somehow produce signs that looked like they were written by someone having a stroke. The fact that Imagen 2 handles multi-lingual text translation (English to Italian to Thai in Witteveen's demo) while maintaining coherent letterforms is... actually noteworthy.

Though he did notice something curious: "It does seem to me that certain languages have a lot of different fonts and will sort of adjust to the original font that you used more than other languages." The Thai version lost the handwritten aesthetic of the English original, though he didn't explicitly prompt for that style to be maintained.

Where It Falls Short (And Why That Matters)

Witteveen ran the same prompts through both Imagen Pro and Imagen 2 to see where the cheaper model stumbles. The results were... mixed in an instructive way.

For a prompt requesting "an unremarkable, unintentional shot, a selfie with a caveman with a T-Rex running behind him," Imagen Pro produced a convincing selfie perspective. Imagen 2's version showed the camera in the caveman's hand—which breaks the illusion of it being an actual selfie photo.

"This is not from the perspective of the selfie if we can actually see the camera in his hand," Witteveen pointed out.

When asked for a top-down view, Imagen 2 got the concept but placed the T-Rex right next to the caveman instead of behind him. It's the kind of spatial reasoning hiccup that suggests the model is cheaper for a reason—it's doing less processing, making more approximations.

But here's the thing: Witteveen noted he "didn't use any sort of description of the environment or anything like that." With more detailed prompting, the gap might shrink. Which means the real question isn't whether Imagen 2 matches Imagen Pro's quality—it's whether the quality difference matters for your specific use case.

The Reference Image Flex

The feature that made me sit up: Imagen 2 can now handle up to 14 reference images for a single generation.

Witteveen demonstrated this with a farm scene—multiple characters, trucks, a tractor, a barn—all fed as references, all appearing coherently in the final generated image. "This is a big win for if you want to do anything with products," he explained. "If you want to do things where you've got multiple references of products in the same product range and you want to actually have all of them in the picture, Imagen 2 actually allows you to do that natively."

This matters for e-commerce, marketing, anyone trying to visualize products in contexts without expensive photoshoots. It's also interesting that Google is positioning the mid-tier model as the product consistency workhorse rather than making it a premium-only feature.

The model also includes an image search tool that Imagen Pro doesn't have (though Pro has standard Google search grounding). This creates an odd capability matrix where the cheaper model has features the expensive one lacks. It suggests Google is segmenting not just on quality but on use case—different tools for different jobs rather than a straight upgrade path.

The Production Economics Question

Pricing-wise, Imagen 2 sits between the original Flash model and Imagen Pro—not as cheap as the budget option, not as expensive as the premium. For people who found Imagen Pro too costly for production use, this positioning is clearly intentional.

"It's certainly a win if you were sort of looking at using Imagen Pro in production but it was just too expensive to do at scale," Witteveen said.

This gets at something interesting about AI model releases in 2024: we're past the phase of pure capability races. Now we're in the phase of market segmentation, where the question isn't just "can it do X?" but "can it do X at a price point that makes sense for this workflow?"

Imagen 2 will roll out across Google's apps—Gemini, AI Studio, Vertex AI—which means widespread availability for different user types. The developer who needs to generate thousands of product mockups has different economics than the designer making one hero image.

What This Tells Us About Where We Are

The existence of Imagen 2 as a distinct product reveals something about the current state of AI image generation: the technology has matured enough that "good enough" is a viable category.

Two years ago, any model that could reliably render text would have been revolutionary. Now we're debating whether slightly worse spatial reasoning is worth the cost savings. That's not a criticism—it's a sign of how rapidly the baseline has shifted.

Witteveen ended his video noting that Imagen 2's release "has got to bode well for the people who are waiting for Gemini 3.1 Flash model." The implication: if the image model at this tier is this capable, the text model might be similarly positioned as the practical middle option.

Which brings us back to the central tension: when building with AI, you're constantly triangulating between capability, cost, and "good enough for this specific thing." Imagen 2 exists because that calculation differs wildly depending on what you're building.

The question isn't whether it's better than Imagen Pro. It's whether the difference matters for what you're trying to do. And sometimes—maybe increasingly—the answer is no.

—Yuki Okonkwo