A Markdown File Just Changed AI Design Forever
Developer Theo discovers that a simple markdown file turns Claude Opus from worst to best at frontend design. Here's how 'skills' work and what it means.
Written by AI. Zara Chen
February 5, 2026

Photo: Theo - t3․gg / YouTube
If you've used AI to generate frontend designs, you know the aesthetic: purple gradients everywhere, Roboto or Inter fonts, predictable layouts that scream "a robot made this." Developer and YouTuber Theo from t3.gg ranked the major AI models for design capability and put Claude Opus dead last—until he discovered something that completely inverted his rankings.
The secret? A markdown file.
I know that sounds ridiculous. A text file shouldn't transform a model's design capabilities. But Theo's extensive testing suggests it does, and the implications are kind of wild for anyone building with AI.
The Setup: Three Models, Two Treatments
Theo ran a controlled experiment testing Gemini 3 Pro, GPT 5.2, and Claude Opus 4.5 on the same task: generate five unique marketing homepage designs for an image generation app. Each model got two versions of the prompt—one baseline, one using what's called a "frontend design skill."
The skill is literally just a markdown file that lives in Claude's open-source skills repository. It's a set of instructions that tells the model how to approach design work, including this delightfully specific directive:
"Never use generic AI generated aesthetics like overused font families like Roboto, Inter, or system fonts, cliche color schemes, particularly purple gradients on white backgrounds, predictable layouts and component patterns, and cookie cutter designs that lack context specific character."
Without this skill, Opus produced what Theo called "awful" designs—purple gradients, noise textures, bento boxes that didn't work. "If this is what you were getting for designs out of Opus, I understand why you discounted its design capabilities," he said. "I did the same when I saw this."
With the skill? Opus generated the designs he'd been using for multiple production projects, including his Frame.io alternative and a new image studio app. The same model, dramatically different output.
What Actually Changes
The frontend design skill isn't magic—it's context steering. The markdown file guides the model toward "intentionality, not intensity" and emphasizes choosing "a clear conceptual direction and execute it with precision." It explicitly tells models to vary between light and dark themes, different fonts, different aesthetics.
But here's what gets interesting: the results varied wildly by model.
Gemini 3 Pro performed best at baseline, generating designs with genuine variety and fewer generic templates. Theo noted one design in particular that "looks really cool and nice" with effective drop shadows and spatial choices. Without the skill, Gemini seemed to already have better design sensibilities baked in.
With the skill added, though, Gemini's output became more generic. "This looks like a generic template you would have bought from somebody who was selling Tailwind templates back in the day," Theo said of one skilled-Gemini result. The skill seemed to constrain rather than expand its capabilities.
GPT 5.2 showed the opposite problem: it appears to ignore the instruction not to use skills. Theo found evidence in the model's internal logs that it was using the frontend design skill even when explicitly told not to, making true baseline comparison impossible. The designs looked consistently similar across both treatments—competent but template-driven.
The Broader Question About Skills
Skills are reusable chunks of context—markdown files that tell models how to behave in specific scenarios. There are skills for using Remotion (a React video library), for specific coding patterns, for various technical implementations. They're open source because they have to be: they're just text instructions.
Theo's discovery raises an uncomfortable question: if a markdown file can transform a model's design output this dramatically, what else are we missing about how to use these tools effectively?
The skill works by essentially giving the model permission and direction to avoid its training biases. AI models learn from existing designs on the internet, which means they learn to reproduce whatever was most common in their training data. Purple gradients, Inter fonts, predictable layouts—these became common because they were common, creating a feedback loop.
The skill breaks that loop by explicitly naming the patterns to avoid and providing alternative framing: "Bold maximalism and refined minimalism both work. The key is intentionality, not intensity."
Theo also shared a prompting hack that improved results across all models: asking for five unique designs in a single request, explicitly instructing the model to make each one different from the others. "When the model within its context is doing multiple different designs with the instruction of making them unique, you're more likely to get unique designs than if you just roll five times because it knows the other four designs," he explained.
The Gemini CLI Problem
One consistent thread through Theo's testing: Gemini's command-line interface is, in his words, "so broken." Multiple times during the video, Gemini agents got stuck, crashed, or failed to follow instructions properly.
"Every time I use a Gemini model, I question how anybody uses a Gemini model for anything other than basic chat answers and data parsing," Theo said, watching yet another Gemini process hang. "It just doesn't seem like they did the thing where they trained it on chat histories for these types of CLIs, unlike all of the other models which have definitely done that."
This matters because a model's capabilities only translate to usefulness if the harness—the interface you interact through—actually works. Gemini might have the best baseline design sensibilities, but if the CLI breaks constantly, that advantage evaporates in practice.
What This Means For Building
If you're using AI for frontend work, three things emerge from Theo's testing:
First, the prompting technique of requesting multiple unique variations in a single context window appears to genuinely improve output quality across models. The model can see its own previous work and actively differentiate.
Second, context steering through skills or similar frameworks can dramatically shift model behavior—but not always in the direction you'd expect. What improves Opus constrains Gemini.
Third, the model that works best depends heavily on your specific use case and tolerance for tooling issues. Gemini has strong design instincts but unstable tooling. Opus needs more guidance but can produce excellent results with proper prompting. GPT sits in the middle but might ignore your attempts to control whether it uses skills.
The markdown file that transformed Theo's results is publicly available on GitHub. Whether it'll work as well for you depends on which model you're using, how you're prompting it, and what you're trying to build. But the core insight holds: these models have more capability than their default outputs suggest. Sometimes you just need to tell them to stop making purple gradients.
—Zara Chen
Watch the Original Video
The Best Model For Frontend Design Is...
Theo - t3․gg
31m 36sAbout This Source
Theo - t3․gg
Theo - t3.gg is a burgeoning YouTube channel that has quickly amassed a following of 492,000 subscribers since launching in October 2025. Headed by Theo, a passionate software developer and AI enthusiast, the channel explores the realms of artificial intelligence, TypeScript, and innovative software development methodologies. Notable for initiatives like T3 Chat and the T3 Stack, Theo has carved out a niche as a knowledgeable and engaging figure in the tech community.
Read full source profileMore Like This
Choosing the Perfect Dev Laptop: AI vs. Traditional Coding
Explore top laptops for AI and coding, balancing performance, price, and specs at MicroEnter Phoenix.
That Agent.md File Might Be Making Your AI Worse
New research shows those popular Agent.md and Claude.md files could actually hurt AI coding performance. Here's what developers need to know about context.
Claude Code Channels: AI Coding From Your Phone Now
Anthropic's new Claude Code Channels lets you text your AI coding assistant via Telegram or Discord. Here's what it means for autonomous AI agents.
Google's Gemini 3.1 Pro: Genius on Paper, Disaster in Practice
Gemini 3.1 Pro crushes benchmarks but fails at basic tasks. Developer Theo tests Google's 'smartest model ever' and finds a genius that can't follow instructions.