Pencil.dev Promised Design-to-Code Magic. Here's

The design-to-code handoff is supposed to be solved by now, right? We've got AI that can write essays and generate images from text prompts, but somehow the workflow between designing a website and actually building it is still—in the words of AI LABS—"the most broken part of building with AI."

Pencil.dev entered the scene claiming to fix exactly that problem. The pitch is simple: a bidirectional bridge between design tools and AI coding agents. Design something, change it, watch the code update. Finally.

Except when AI LABS actually tested it by building a full multi-page website for a creator studio, that's not quite how it worked.

The Gap Between Promise and Product

Pencil.dev has the interface of Figma and connects to AI coding platforms like Claude Code and Codex. It stores design files as .pen files—basically JSON that you can version control with Git. All the pieces are there: component generation, UI library support, automatic CSS classes. The team connected it to Claude Code using Opus 4.6 and started designing.

Here's where the "bidirectional bridge" claim hits reality: "We assumed Claude or any AI agent would autosync the design to code and vice versa. But that's not how it worked," the team explains. "We had to manually prompt it to sync because it does not automatically sync the design to code."

Every time they made a design change, they had to tell Claude to sync again. Then wait while it analyzed the design section by section and implemented it in HTML. It matched the design exactly, but doing it repeatedly became exhausting overhead.

This is the pattern with so many AI tools right now—the demo looks frictionless, but the actual workflow has manual steps everywhere. Tools like Stitch and Bolt go straight from prompt to code with no design canvas. Figma MCP is read-only, so AI can pull designs but can't create them. And if you're using a coding agent directly, every design tweak means starting the prompt from scratch.

Pencil.dev gets closer than most. But "closer" still meant the AI LABS team needed to build their own solution.

Building the Automation Pencil Forgot

The team's fix was a file-watching script that monitors the .pen design file and automatically triggers Claude CLI whenever they save changes. No more manual sync prompts. No more repetitive commands eating into their token limits.

The script uses JavaScript libraries built for monitoring file changes, with cooldown periods to prevent Claude from firing repeatedly on small edits. Run npm run sync, make your design changes, hit save—Claude syncs automatically.

There's a crucial pre-step though: you have to preconfigure all the permissions (read, write, MCP tool calls) in Claude's settings.json file. Without that, "Claude will get blocked on the permission prompt indefinitely," they note. It's the kind of setup detail that separates tools that work in demos from tools that work in practice.

With the script running, they could iterate on the design without thinking about the implementation step. Each save triggered the sync. The workflow finally felt smooth.

Where Multi-Agent Systems Actually Matter

Once they had automated syncing working, the team pushed further: five pages designed in parallel using Claude Code's multi-agent system. Each agent handled a different page while maintaining design consistency across fonts, colors, and styling.

"Since we had five pages, Claude spawned five agents and let each one work on a dedicated page," they explain. The agents accessed shared context docs—PRD, UI guides—to keep everything aligned. Save the design file, the script auto-syncs, and suddenly you've got a multi-page site that actually looks cohesive.

This is where AI tools start earning their keep: not replacing human work, but parallelizing the tedious parts. One person doing five pages sequentially takes time. Five agents working simultaneously while you focus on design decisions? That's a different equation.

But the site still felt static. No motion, no scroll animations. Time to layer in GSAP.

Stacking Libraries for Polish

GSAP handles the scroll animations—what happens when you scroll. Then they added Lenis on top for smooth scrolling itself—how the scroll feels. "GSAP controls what happens when you scroll. And Lenis controls the look and feel of the scroll itself," they explain. "Without Lenis, the scroll jumps between positions and with it, the page flows smoothly."

They used XML-structured prompts for both libraries because Claude models are "explicitly tuned to work better with XML structured prompts." The prompts detailed dependencies, target elements, behavior rules—everything needed for Claude to implement without guesswork.

The result: a site that went from functional to immersive. The kind of polish that makes the difference between "this works" and "this feels good to use."

UX Audit Caught What Eyes Miss

The final step was a custom UX audit skill they built with Skill Creator. It checked UI quality, reviewed the site against a nine-point checklist, and scored it against WCAG compliance standards. The audit used Python scripts to programmatically catch issues that human reviewers often miss—things like color contrast ratios.

First run: two critical issues, multiple major and minor problems, overall grade of C. The critical issues were color contrast. After implementing the fixes: grade improved to B, all major issues resolved.

"The website though it might not look significantly different have a huge improvement in usability, making it easier to navigate and WCAG compliant as well," they note. That gap between looking good and being accessible is exactly what automated audits catch.

What This Actually Tells Us

Pencil.dev isn't bad—it's genuinely closer to solving the design-to-code problem than most tools. But "closer" required the AI LABS team to build automation scripts, configure permissions manually, write detailed XML prompts, and create custom audit tools.

This is the state of AI design tools right now: promising foundations that need engineering to become seamless. The question isn't whether Pencil.dev works—it does, with effort. The question is whether you're willing to build the missing pieces yourself or wait for someone else to ship them.

For teams with technical chops and specific workflows, building those missing pieces might be worth it. For everyone else, the gap between promise and product is still pretty wide. The design-to-code handoff remains broken, just slightly less broken than before.

—Tyler Nakamura, Consumer Tech & Gadgets Correspondent