Recipe Scraper / Q1 2025
48,000 Recipes Scraped at 99.98 Percent Success Rate.
Async scraping across 15 sources with rate limiting, deduplication, and a Flask UI.
// The Build
Async scraping via aiohttp with bounded concurrency and polite rate limiting. Three-retry logic with exponential backoff for network failures. URL-based deduplication with checkpoint persistence and separate failed URL tracking.
Multiple recipe aggregation sources supported through the recipe-scrapers library. Each recipe gets normalized into a unified schema with structured ingredient parsing (converting '2 cups flour' into quantity, unit, and name), automatic search term generation, and fractional quantity support.
Flask web UI for managing imports and browsing the collection. The data feeds directly into the Hearthlight meal planning platform.
// outcome
48K+ recipes collected with near-perfect reliability. Zero duplicates.
48K+
Recipes scraped
99.98%
Success rate
15
Sources
0
Duplicates
Hearthlight
Q1 2026Next.js 16, three subscription tiers, tarot readings, moon phases, and recipe management. 346K lines of code.
Own product
Personal Finance Dashboard
Q1 2025Automated PDF extraction, multi-account consolidation, and spending analytics.
Own product
SEO Intelligence System
Q4 2025Semantic search across every page, product, and keyword so new content finds its slot, not its competition.
NDA client
Got a similar problem? Let’s talk.