Playwright CLI vs MCP Server: The Token Usage

The folks at Better Stack just dropped a comparison video that asks a question I've been hearing variations of since 1997: which tool is better? Back then it was Perl vs Python. Then Ruby vs [Python. Then Node vs everything. Now it's Playwright CLI vs MCP [Server for browser automation with AI coding agents.

The answer, as always, is the one developers hate: it depends.

But here's what's interesting—the video creator actually does something useful. Instead of declaring a winner, they ran the same simple task through both tools: automating a Twitter video download, taking a screenshot, clearing local storage. Real work, not benchmarks designed to prove a point.

The Token Economics

The Playwright CLI completed the task using 16% of available context tokens. The MCP server used 18%. Before you start, that 2% difference is not the story here.

What caught my attention is where those tokens go. The MCP server burns 3.6K tokens just loading its tools before it does anything. It's like arriving at a job site with a truck full of equipment you might not need. The CLI, by contrast, shows up lean—68 tokens to get started.

"Notice it only takes up 68 tokens," the creator points out while firing up the CLI. "If we scroll up, we can see that already 15% of the context is being used because all of these MCP tools are being loaded."

That upfront cost matters when you're chaining operations together or working within tight context windows. It matters less if you're doing one-off tasks. See? It depends.

What You Actually Get

Here's where things get more interesting than token counting. The Playwright CLI gives you everything Playwright can do, out of the box. PDF generation, tracing, the full toolkit. No configuration required.

The MCP server makes you opt into advanced features because exposing them all would consume too much context. It's a design trade-off—the MCP server assumes you want portability and standardization more than you want every possible feature immediately available.

Both approaches make sense for different problems. The CLI assumes you're in a terminal environment and want maximum capability. The MCP server assumes you're building something that needs to run across different environments and wants to play nice with other tools through a standard protocol.

The Human Factor

The video makes a point about the CLI that resonates: it's not just for agents. You can write a bash script that both humans and AI agents can execute. For repetitive tasks or end-to-end testing, that's genuinely useful.

I've been around long enough to remember when we called this "automation" instead of "agentic workflows." The problems are similar, but the context is different. When your automation needs to work with an AI that's burning through tokens, suddenly token efficiency isn't premature optimization—it's architecture.

Where MCP Wins

The MCP server has one clear advantage the CLI can't match: it's a standard protocol. If you're building an agentic loop that needs to run anywhere—browser, desktop, mobile—the MCP server works because it's designed for that portability.

"Because it's a standard protocol that agents use to access tools," the creator explains. "And because Playwright runs JavaScript or TypeScript code, you can run this code in any environment that supports the JavaScript runtime."

Standardization has value. We learned this with REST APIs, with USB-C, with every successful protocol that let different systems talk to each other. If your architecture needs that interoperability, the token overhead might be worth it.

The Third Option

The video teases a third option: Vercel's Agent Browser, which runs Playwright under the hood but with a Rust CLI. Supposedly faster, supposedly lower token usage. The creator promises details in another video.

This is how the industry works now. Every problem spawns multiple solutions, each optimized for different constraints. Twenty years ago we'd have one tool that did everything badly. Now we have three tools that each do one thing well, and you need to understand your actual requirements to pick the right one.

The Permission Problem

One detail the video mentioned in passing: "I've noticed the MCP server asks for way more permissions than the CLI does."

This isn't trivial. Permission models matter for security, for user trust, for compliance. If you're building something for an enterprise environment, permission sprawl is a real problem. The CLI's lighter permission footprint might matter more than token efficiency depending on your deployment constraints.

Configuration as Optimization

The MCP server lets you toggle tools on and off to reduce token usage. The CLI just gives you everything. This reflects different philosophies about configuration.

The CLI approach: sane defaults that cover most use cases, and if you need less, you know how to ignore what you don't need. The MCP approach: expose configuration because different contexts demand different trade-offs.

Neither is wrong. Both reflect assumptions about their users and use cases.

What This Actually Reveals

Here's what I find interesting about this comparison: it's not really about which tool is better. It's about the architecture decisions you're making when you choose how your AI agents interact with browsers.

Are you building for a single environment or multiple platforms? Do you value having every feature immediately available or keeping context windows lean? Are you optimizing for human readability of your automation scripts or for standardized agent-to-tool communication?

The Better Stack video walks through one task with both tools. Both completed it. The CLI did it with slightly less token overhead and no screenshot issues. The MCP server did it with more permissions requests and a couple of stumbles, but in a way that would port more easily to other environments.

They tested with a simple task: download a video, wait, screenshot, clean up. In production, your tasks are messier. Your constraints are different. Your team's familiarity with one approach over another might matter more than a 2% token difference.

The real lesson here isn't which tool wins. It's that when someone asks "which is better," the first question should always be: better for what?

—Mike Sullivan, Technology Correspondent