Edited by humans. Written by AI. How our editing works
BUZZRAGNews. Trends. Ideas — distilled in minutes.
All articles

Web Scraping With an API: A Beginner's Guide

Anna Kubo's freeCodeCamp tutorial shows beginners how to scrape the web using SerpApi and Node.js — skipping the hard parts without skipping the learning.

Tyler Nakamura

Written by AI. Tyler Nakamura

June 9, 20266 min read
Share:
Woman with blonde hair smiling against a dark blue digital background with white text reading "Web Scraping for Beginners…

Photo: AI. Mika Sørensen

Here's the deal with web scraping tutorials: most of them start by assuming you already know how to lose. You're halfway through the setup, a website throws a CAPTCHA at you, your script breaks, and you're googling "proxy rotation" at 11pm wondering why you even started. The learning curve isn't the code — it's the walls websites put up before you even get to write any.

That's the exact problem Anna Kubo addresses head-on in her new freeCodeCamp tutorial, Web Scraping for Beginners – Extract Data with an API. Her pitch is refreshingly honest: instead of teaching you to fight those walls yourself, she shows you how to let someone else handle them.

"Instead of building scrapers from scratch, dealing with proxies, or constantly fixing broken scripts, we're going to use an API that handles all of that for us. It runs searches in a real browser, solves CAPTCHAs automatically, and returns clean, structured data in JSON that we can use directly in our apps."

That API is SerpApi, which acts as a fully managed scraping layer between your code and the web. You send it a query; it runs the search in a real browser, handles bot detection and rate limits on its end, and hands you back clean JSON. The tradeoff — and it's worth naming plainly — is that you're introducing a paid third-party dependency into your project. SerpApi has a free tier, but if your scraping needs scale up, so does the bill. Kubo is transparent about this, and the tutorial is actually sponsored by SerpApi, which she discloses upfront. That context matters when evaluating her approach.

What the Tutorial Actually Covers

The hour-long course is structured in two acts: learn the tools, then build something real with them.

The first act walks through SerpApi's capabilities using a Node.js project. After signing up and grabbing an API key, Kubo runs a dead-simple terminal command — two parameters, q for query and your API key — and gets back Google search results as JSON. From there, she layers in optional parameters: location, language, Google domain, country code. The documentation walkthrough is genuinely useful here; she points out that you can't just make up parameter values and shows you where to find SerpApi's supported options.

What's worth paying attention to is how much ground the API covers. Beyond Google Search, SerpApi offers engines for Bing, DuckDuckGo, Amazon product listings, the Apple App Store, YouTube search, and more. Kubo also demos the Google Short Videos API (for scraping Instagram Reels, YouTube Shorts, and similar content) and the Google Lens API, which lets you run reverse image searches programmatically and pull back visual match results. The Lens demo — passing in a photo of Danny DeVito and getting back visual matches from across the web — is a solid illustration of how weird and powerful this kind of tooling can be.

Python, Java, Rust, and even Google Sheets integrations are available through SerpApi's documentation, though Kubo builds everything in Node.js here.

The Real Project: A Short Video Scraper

The second act is where things get interesting from a "what can I actually do with this" perspective. Kubo builds a full-stack web app — Node.js/Express backend, plain HTML/CSS/JavaScript frontend — that:

  1. Takes a search query and a result count (5, 10, or 15 videos)
  2. Hits SerpApi's Google Short Videos engine
  3. Renders the results as a grid of cards with thumbnails, titles, durations, and sources
  4. Downloads individual videos (or all of them at once) to your local machine via yt-dlp

The yt-dlp integration is the genuinely clever part. SerpApi returns video metadata and links; yt-dlp is the open-source tool that actually fetches and saves the video file as an MP4. Kubo runs it as a child process from within Node using execFile, which is a pattern worth knowing if you're new to Node's child_process module.

The build is intentionally stripped down — she calls it "a skeleton" — but that framing is honest rather than apologetic. The point isn't to hand you a finished product; it's to show you the connective tissue so you can extend it yourself. The full code is on GitHub if you want to dig in.

The Part That Actually Matters: API Key Security

Scattered throughout the tutorial — and given its own dedicated section at the end — is a consistent emphasis on keeping your API key out of your code. Kubo shows her own key on screen early on and immediately explains why that's fine in that context (she's disabling it after filming) but dangerous in production:

"If other people get hold of your API key and use it in their projects, they could potentially use up all your credits or even rack up a hefty credit card bill if you decide to attach credit cards to your platforms. That goes for every API key out there."

The solution she implements is a .env file with the dotenv package, storing the key as an environment variable accessed via process.env.API_KEY. It's a foundational security habit, and the fact that she builds the whole project first and then retrofits the env variable handling actually reinforces why it matters — you can see exactly where the exposure would occur.

This is the kind of thing that beginner tutorials often skip because it feels like infrastructure rather than learning. Kubo doesn't skip it.

The Honest Tension Here

There's a real question buried in this approach that the tutorial doesn't fully surface, and it's worth sitting with: what are you actually learning when the hard parts are abstracted away?

Traditional web scraping tutorials — the kind Kubo describes as "super complicated" or prone to breaking — teach you to understand HTTP requests, parse HTML with tools like Cheerio or BeautifulSoup, rotate user agents, handle pagination, and debug when sites change their markup. That's genuinely difficult, and a lot of beginners bounce off it. Kubo's approach gets you to working, useful output faster. But the knowledge of why CAPTCHAs exist, how rate limiting works, and what's actually happening when SerpApi "runs searches in a real browser" lives inside a black box you're paying to not open.

That's not a criticism of the tutorial — it's honest about what it is. For someone who needs to automate data collection for a project and doesn't have weeks to spend on scraping infrastructure, this is a practical path. For someone who wants to understand web scraping deeply, this is a solid starting point that might eventually prompt you to go deeper on your own.

"I just want to give you the skeleton to do this. So you can go forth, make it your own, add features, expand on this idea."

That framing — skeleton, not finished product — is probably the right way to hold the whole tutorial. Kubo isn't selling you a complete solution. She's showing you that working data extraction is more accessible than the hard-mode tutorials suggest, and handing you something you can actually run and modify.

The question of whether the abstraction is training wheels or a ceiling depends entirely on where you want to go next.


— Tyler Nakamura, Consumer Tech & Gadgets Correspondent, BuzzRAG

From the BuzzRAG Team

We Watch Tech YouTube So You Don't Have To

Get the week's best tech insights, summarized and delivered to your inbox. No fluff, no spam.

Weekly digestNo spamUnsubscribe anytime

More Like This

Man wearing glasses in dark room with laptop displaying code, neon blue lighting, text overlay about documenting home lab…

This Tool Treats Your Home Lab Like Infrastructure Code

RackPeek documents home labs as YAML code in Git. Brandon Lee shows how this infrastructure-as-code approach beats static diagrams and spreadsheets.

Tyler Nakamura·3 months ago·5 min read
Developer at computer workstation with code and analytics dashboards displayed, illuminated by neon purple and blue…

30 Self-Hosted GitHub Projects Trending Right Now

From media automation to AI chat apps, here are 30 trending self-hosted GitHub projects that put you back in control of your data and infrastructure.

Tyler Nakamura·3 months ago·6 min read
Man in orange jacket holding a glowing rocket-powered drone at sunset on a beach with text "FASTEST ELECTRIC DEVICE EVER

World's Fastest Drone Reclaims Record with V4

Discover how Peregreen V4 reclaimed the world's fastest drone title with a speed of 657 km/h.

Tyler Nakamura·5 months ago·3 min read
A Figma logo with a red X crossed through it, with a yellow arrow pointing to an open book icon, against a dark background…

Penpot Wants to Fix Design Handoff—Does It Actually?

Better Stack demos Penpot, an open-source design tool that speaks CSS natively. We look at what it solves, what it doesn't, and who should care.

Tyler Nakamura·2 months ago·6 min read
A cartoon snail on a metal rail with a yellow arrow pointing right, with "GOODBYE" and "SLOW RAG" text on a black background

Crawl4AI Claims 6x Speed Over Scrapy for RAG Pipelines

Crawl4AI promises faster web scraping built specifically for AI workflows. Better Stack tests its claims against traditional Python tools.

Tyler Nakamura·4 months ago·6 min read
Man in business suit with surprised expression holds finger to lips beside glowing orange pixelated flame graphic with…

Firecrawl: The Gen Z Tool for Web Scraping Made Easy

Discover how Firecrawl transforms web scraping with natural language for Gen Z tech enthusiasts.

Tyler Nakamura·5 months ago·3 min read
Man in blue shirt holding MacBook displaying M5 Max logo against colorful background

Apple M5 Max Crushes Local AI—Even Beats M3 Ultra

The M5 Max's prompt processing destroys Apple's desktop M3 Ultra. Real-world tests show this laptop is rewriting local AI performance expectations.

Tyler Nakamura·3 months ago·6 min read
Man in dark shirt pointing to whiteboard diagram showing AIOS concentric circles with Context, Intel, Automate, and Build…

AI Operating Systems: The Business Wrapper You Build in Layers

Liam Ottley explains AI Operating Systems: not a business model, but a methodology for wrapping AI around your existing operations to free up founder bandwidth.

Tyler Nakamura·3 months ago·5 min read

RAG·vector embedding

2026-06-09
1,606 tokens1536-dimmodel text-embedding-3-small

This article is indexed as a 1536-dimensional vector for semantic retrieval. Crawlers that parse structured data can use the embedded payload below.