A Markdown File Just Changed AI Design Forever

If you've used AI to generate frontend designs, you know the aesthetic: purple gradients everywhere, Roboto or Inter fonts, predictable layouts that scream "a robot made this." Developer and YouTuber Theo from t3.gg ranked the major AI models for design capability and put Claude Opus dead last—until he discovered something that completely inverted his rankings.

The secret? A markdown file.

I know that sounds ridiculous. A text file shouldn't transform a model's design capabilities. But Theo's extensive testing suggests it does, and the implications are kind of wild for anyone building with AI.

The Setup: Three Models, Two Treatments

Theo ran a controlled experiment testing Gemini 3 Pro, GPT 5.2, and Claude Opus 4.5 on the same task: generate five unique marketing homepage designs for an image generation app. Each model got two versions of the prompt—one baseline, one using what's called a "frontend design skill."

The skill is literally just a markdown file that lives in Claude's open-source skills repository. It's a set of instructions that tells the model how to approach design work, including this delightfully specific directive:

"Never use generic AI generated aesthetics like overused font families like Roboto, Inter, or system fonts, cliche color schemes, particularly purple gradients on white backgrounds, predictable layouts and component patterns, and cookie cutter designs that lack context specific character."

Without this skill, Opus produced what Theo called "awful" designs—purple gradients, noise textures, bento boxes that didn't work. "If this is what you were getting for designs out of Opus, I understand why you discounted its design capabilities," he said. "I did the same when I saw this."

With the skill? Opus generated the designs he'd been using for multiple production projects, including his Frame.io alternative and a new image studio app. The same model, dramatically different output.

What Actually Changes

The frontend design skill isn't magic—it's context steering. The markdown file guides the model toward "intentionality, not intensity" and emphasizes choosing "a clear conceptual direction and execute it with precision." It explicitly tells models to vary between light and dark themes, different fonts, different aesthetics.

But here's what gets interesting: the results varied wildly by model.

Gemini 3 Pro performed best at baseline, generating designs with genuine variety and fewer generic templates. Theo noted one design in particular that "looks really cool and nice" with effective drop shadows and spatial choices. Without the skill, Gemini seemed to already have better design sensibilities baked in.

With the skill added, though, Gemini's output became more generic. "This looks like a generic template you would have bought from somebody who was selling Tailwind templates back in the day," Theo said of one skilled-Gemini result. The skill seemed to constrain rather than expand its capabilities.

GPT 5.2 showed the opposite problem: it appears to ignore the instruction not to use skills. Theo found evidence in the model's internal logs that it was using the frontend design skill even when explicitly told not to, making true baseline comparison impossible. The designs looked consistently similar across both treatments—competent but template-driven.

The Broader Question About Skills

Skills are reusable chunks of context—markdown files that tell models how to behave in specific scenarios. There are skills for using Remotion (a React video library), for specific coding patterns, for various technical implementations. They're open source because they have to be: they're just text instructions.

Theo's discovery raises an uncomfortable question: if a markdown file can transform a model's design output this dramatically, what else are we missing about how to use these tools effectively?

The skill works by essentially giving the model permission and direction to avoid its training biases. AI models learn from existing designs on the internet, which means they learn to reproduce whatever was most common in their training data. Purple gradients, Inter fonts, predictable layouts—these became common because they were common, creating a feedback loop.

The skill breaks that loop by explicitly naming the patterns to avoid and providing alternative framing: "Bold maximalism and refined minimalism both work. The key is intentionality, not intensity."

Theo also shared a prompting hack that improved results across all models: asking for five unique designs in a single request, explicitly instructing the model to make each one different from the others. "When the model within its context is doing multiple different designs with the instruction of making them unique, you're more likely to get unique designs than if you just roll five times because it knows the other four designs," he explained.

The Gemini CLI Problem

One consistent thread through Theo's testing: Gemini's command-line interface is, in his words, "so broken." Multiple times during the video, Gemini agents got stuck, crashed, or failed to follow instructions properly.

"Every time I use a Gemini model, I question how anybody uses a Gemini model for anything other than basic chat answers and data parsing," Theo said, watching yet another Gemini process hang. "It just doesn't seem like they did the thing where they trained it on chat histories for these types of CLIs, unlike all of the other models which have definitely done that."

This matters because a model's capabilities only translate to usefulness if the harness—the interface you interact through—actually works. Gemini might have the best baseline design sensibilities, but if the CLI breaks constantly, that advantage evaporates in practice.

What This Means For Building

If you're using AI for frontend work, three things emerge from Theo's testing:

First, the prompting technique of requesting multiple unique variations in a single context window appears to genuinely improve output quality across models. The model can see its own previous work and actively differentiate.

Second, context steering through skills or similar frameworks can dramatically shift model behavior—but not always in the direction you'd expect. What improves Opus constrains Gemini.

Third, the model that works best depends heavily on your specific use case and tolerance for tooling issues. Gemini has strong design instincts but unstable tooling. Opus needs more guidance but can produce excellent results with proper prompting. GPT sits in the middle but might ignore your attempts to control whether it uses skills.

The markdown file that transformed Theo's results is publicly available on GitHub. Whether it'll work as well for you depends on which model you're using, how you're prompting it, and what you're trying to build. But the core insight holds: these models have more capability than their default outputs suggest. Sometimes you just need to tell them to stop making purple gradients.

—Zara Chen