Edited by humans. Written by AI. How our editing works
All articles

Gemini 3.1 Flash Lite's Web Scraping Accuracy Has a Catch

Gemini 3.1 Flash Lite hits 100% URL accuracy for $4/1,000 pages. That's the pitch. Here's what the benchmark doesn't tell you.

Rachel "Rach" Kovacs

Written by AI. Rachel "Rach" Kovacs

May 9, 20266 min read
Share:
Comparison chart showing Gemini Flash at $0.25/M versus Firecrawl at $0.83/M with benchmark results highlighting cost…

Photo: AI. Ren Takahashi

Google's newest lightweight model reportedly scrapes the web with 100% link accuracy at roughly $4 per thousand pages. If you run any kind of automated data pipeline, that number just made you sit up straighter. If you own a website, it should also make you a little uncomfortable.

Both reactions are reasonable. They're also not mutually exclusive.

Hamish from the Income Stream Surfers YouTube channel published a hands-on test this week of Gemini 3.1 Flash Lite—Google's new small model described in its own documentation as "designed for lightweight agentic workflows, simple data extraction, and applications where responsiveness and API costs are the primary constraints." Hamish's results were striking enough that the piece warrants covering. But so does the context the video doesn't bother with.

A note before the numbers: Hamish refers throughout his video to a model he calls "GPT-5 Nano" as a comparison baseline. As of this writing, OpenAI has not publicly released a model by that name. It's possible Hamish is using an internal label, a preview model not publicly announced, or a name that predates broader verification. I'm flagging this because the hallucination rate figures he cites—roughly 50% link hallucination for this unnamed model—are used as a benchmark throughout the piece, and the comparison only holds if we know what we're actually comparing. Similarly, Hamish cites a May 7th release date for Flash Lite and claims a 1 million token context window; both are plausible but sourced entirely from his video. Verify against Google's official model documentation before you build anything critical on those specs.

What he actually tested

The stack is straightforward: feed a URL to Jina Reader (a service that converts web pages to clean markdown), then pass that markdown to Gemini 3.1 Flash Lite with a prompt asking it to extract structured JSON—links, images, product data, whatever you're after. No complex infrastructure. No Firecrawl subscription.

Hamish ran this against a real site, 2min.it, and came back with 96 out of 96 URLs returning HTTP 200—meaning every single link the model returned was a real, live URL. He also got 36 out of 36 images verified as genuine. In his own words: "To have 100% real URLs here is extremely interesting at a very, very good price."

For anyone who's run LLM scraping pipelines before, that accuracy figure is legitimately notable. Language models have a well-documented tendency to confabulate plausible-looking URLs that go nowhere—a maddening failure mode when you're trying to build a link database or extract product catalog data. Hamish says he previously used a different lightweight model for this task and saw roughly 50% of returned links be hallucinated; he then upgraded to Gemini 3 Flash and still saw about 75% hallucination on the same task. Flash Lite apparently fixed what its bigger sibling couldn't. That's the kind of engineering outcome that actually matters in production.

The pricing math holds up on its face: $0.25 per million input tokens, $1.50 per million output tokens, with batch API discounts that bring 1,000 scraped pages down to around $4. That's cheap enough that cost is no longer a meaningful barrier to operating an LLM scraping pipeline at scale.

The grounding trap

The most practically useful thing in Hamish's video is a warning that has nothing to do with Flash Lite specifically—it applies to any Gemini model with grounding enabled.

Grounding lets the model pull live Google Search results to augment its responses, similar to how Perplexity works. It's genuinely useful for research tasks. It's also priced per query, and models will run as many queries as they think they need unless you explicitly constrain them.

Hamish discovered this the hard way: his daily API spend went from $10 to $150 overnight because a model decided 30 queries was the right number for a given task. "We went from spending like $10 a day to $150 a day, which luckily I noticed and stopped," he says. His advice: cap the query count explicitly in your system prompt, or skip grounding entirely and do traditional LLM scraping instead. Good advice. The kind of thing that would have saved his team real money if someone had put it in the documentation more prominently.

The part the video doesn't cover

Hamish is transparent about what he's doing with this pipeline: HarborSEO, his product, uses LLM scraping to extract product data and information from other people's websites, then feeds that structured data into AI-generated blog posts. He's also explicit that the data itself is sellable—"LLM scraping is something that you can basically resell. So what you actually resell is the JSON structured data."

That's a legitimate business model, and it's not illegal in most jurisdictions (robots.txt compliance and ToS questions vary by site). But it's worth naming what it actually is: an automated system that reads websites at scale without the site owner's knowledge or consent, extracts their content, and repackages it into a competing product.

The websites being scraped don't know it's happening. They don't get paid. When the output is an AI-written article that surfaces in search results alongside—or instead of—the original source, the economics flow entirely one direction.

I'm not saying Hamish is doing anything wrong. The web has always had crawlers; Google's entire business is built on scraping. But there's a difference between a search engine indexing a page so users can find the original, and a content pipeline extracting structured data to generate articles that substitute for the original. One sends traffic to sources. The other doesn't need to.

The reason this model release matters—the actual reason it matters beyond the benchmark numbers—is that it removes cost as a friction point for the latter. At $4 per thousand pages, the barrier to running an industrial-scale content extraction operation is essentially gone. That's what "100% accuracy at this price" means in practice.

Whether that's a problem depends on where you sit. If you're a developer building data pipelines, this is a genuinely useful tool at a genuinely good price. If you publish original content, research, or product information on the web, the model that just dropped makes it cheaper than ever for someone else to extract the value from your work and monetize it elsewhere.

Hamish isn't the threat here—he's just the early adopter showing you the playbook. The question worth sitting with is what happens when everyone running a content business reads the same tutorial.


Rachel "Rach" Kovacs is Buzzrag's cybersecurity and privacy correspondent. Former white hat hacker, former InfoSec director, permanently suspicious of anything priced to scale.

From the BuzzRAG Team

AI Moves Fast. We Keep You Current.

Framework breakdowns, tool comparisons, and AI coding insights — distilled from the best tech YouTube creators. Free, weekly.

Weekly digestNo spamUnsubscribe anytime

More Like This

A cartoon snail on a metal rail with a yellow arrow pointing right, with "GOODBYE" and "SLOW RAG" text on a black background

Crawl4AI Claims 6x Speed Over Scrapy for RAG Pipelines

Crawl4AI promises faster web scraping built specifically for AI workflows. Better Stack tests its claims against traditional Python tools.

Tyler Nakamura·4 months ago·6 min read
Google DeepMind announcement of Gemini 3.1 Pro with blue digital wave design and Google logo on dark background

Google's Gemini 3.1 Pro: Testing the Hype vs. Reality

Google's Gemini 3.1 Pro shows impressive benchmark gains and coding abilities, but real-world testing reveals persistent issues that temper the enthusiasm.

Rachel "Rach" Kovacs·4 months ago·6 min read
Man in dark jacket at microphone with tweet overlay stating he bought 2 $10,000 Mac Studios for OpenClaw, with skeptical…

This Developer Spent $20K Building an AI Company That Never Sleeps

Alex Finn invested $20,000 in local AI models to create a 24/7 autonomous digital workforce. Here's what happened when the API costs disappeared.

Zara Chen·5 months ago·6 min read
Bearded man in glasses and light blue beanie at laptop with glowing cityscape background and "NOT READY" text overlay

Claude Opus 4.7's Hidden Cost: When AI Gets Smarter and Pricier

Anthropic's Opus 4.7 fixes major bugs but ships with a tokenizer that costs 35% more. AI researcher Nate Jones tests whether the upgrade justifies the price.

Rachel "Rach" Kovacs·2 months ago·7 min read
Man with headphones holds glowing orb with tech logos while "2026" appears in gold text against colorful bokeh background

Tech Predictions 2026: Linux, Cybersecurity & AI

Explore bold predictions for 2026 in Linux, cybersecurity, and AI acquisitions, including a potential Amazon-Anthropic deal.

Rachel "Rach" Kovacs·6 months ago·3 min read
Man in glasses pointing at documentation about Claude Code agent teams with text highlighting experimental features and…

Claude Code's Agent Teams: What Multi-AI Collaboration Actually Means

Anthropic quietly shipped agent teams for Claude Code—multiple AIs that coordinate in real time. Here's what the architecture reveals about AI development's direction.

Rachel "Rach" Kovacs·5 months ago·6 min read
Bearded developer in beanie and glasses with wide-eyed expression standing before terminal window showing "ollama run…

Nvidia's NemoClaw Bets on Engineering Basics, Not AI Hype

While OpenAI and Anthropic partner with consultants to deploy AI agents, Nvidia's NemoClaw assumes developers can handle it—if we remember basic engineering.

Rachel "Rach" Kovacs·3 months ago·6 min read
Retro pixel-art style text reading "CLAUDE" in coral-colored blocky letters against a black background with vibrant cyan…

Claude Code Channels: Always-On AI Agents for DevOps

Anthropic's Channels feature turns Claude Code into an always-on agent that reacts to CI failures, production errors, and monitoring alerts automatically.

Rachel "Rach" Kovacs·3 months ago·6 min read

RAG·vector embedding

2026-05-09
1,542 tokens1536-dimmodel text-embedding-3-small

This article is indexed as a 1536-dimensional vector for semantic retrieval. Crawlers that parse structured data can use the embedded payload below.