Automated retail
listing intelligence
How we replaced daily manual page-checking with a serverless Python pipeline that monitors Sam's Club listings every morning, bypasses bot protection, tracks price and stock history in SQLite, and delivers a formatted HTML digest to the inbox — at zero monthly cost.
Manual page-checking is slow, stateless, and easy to skip
Monitoring product listings at Sam's Club — tracking prices, variant availability, coupons, and stock status — was entirely manual. A buyer opens 19 browser tabs, scans each page, and tries to remember whether prices changed since yesterday. There's no history. No alerts. No audit trail.
On top of that, Sam's Club runs Kasada bot protection, which silently blocks any naive scraping attempt and returns a clean-looking page with no real data. A direct HTTP GET isn't enough — the protection layer has to be actively routed around.
One browser tab per product. No price history. Coupons easy to miss. Stock checks give a point-in-time snapshot only. OOS products surface hours or days late.
19 products checked, scraped, diffed against yesterday, and delivered as a formatted HTML digest before the workday starts. Day-over-day price arrows. Per-variant stock. Coupon badges. Zero manual effort.
A five-stage pipeline, no server required
Cron trigger — GitHub Actions
A scheduled workflow fires daily at 8:30 AM EST. No server, no scheduler service. GitHub Actions is the entire execution environment — secrets stored in the repo settings, Python set up fresh on each run.
github-actionsProduct list — Google Sheets CSV
The watchlist lives in a Google Sheet published as a CSV. sheets.py fetches and validates it with no API key required. Non-technical users can add or remove products by editing the sheet — no code change needed.
Bot-bypass scraping — ScraperAPI + layered fallback
Sam's Club URLs are routed through ScraperAPI's residential proxy pool, which bypasses Kasada. The page's embedded __NEXT_DATA__ JSON blob — the canonical source for price, variant maps, and coupon data — is parsed directly. Generic retailers fall through a four-strategy fallback chain.
History database — SQLite committed to repo
Every result is persisted to history.db. After each run, the updated DB is auto-committed back to the repository — Git history doubles as the audit log. No external storage service. No credentials to rotate. Day-over-day price deltas are computed at report time from the DB.
HTML email report — Gmail SMTP
The report is built with fully inline CSS — immune to email client stripping. Summary cards show counts of in-stock, OOS, price changes, and errors. Every product row shows a price delta arrow, coupon badge, and per-variant stock detail. Sent via Gmail App Password auth.
report.pygmail-smtpSix modules, one cron, no infrastructure
GitHub Actions (cron: 8:30 AM EST daily) │ ▼ monitor.py ← entry point / orchestrator ├── sheets.py ← fetch product list from Google Sheet CSV ├── scrapers.py ← scrape price/stock/variants for each URL │ ├── Sam's Club path ScraperAPI → __NEXT_DATA__ JSON │ │ variant map · coupon · price range · per-variant stock │ └── Generic path JSON-LD → meta tags → CSS selectors → regex scan │ Amazon · Walmart · Target · Best Buy · Home Depot · Lowe's ├── db.py ← SQLite schema · read/write · trend queries ├── report.py ← HTML + plain-text email builder (inline CSS) └── config.py ← env var loader with validation and defaults │ ▼ Gmail SMTP → inbox │ ▼ git commit history.db → repo (Git history = audit log)
What each file does
| File | Role |
|---|---|
| monitor.py | Orchestrates the full pipeline end-to-end |
| scrapers.py | Multi-strategy price/stock extraction (456 lines); Sam's Club + 5 generic retailers |
| sheets.py | Google Sheets CSV fetch and column validation |
| db.py | SQLite schema, read/write, day-over-day delta queries |
| report.py | HTML email builder with inline CSS + plain-text fallback |
| config.py | Env var loader with type coercion, validation, and sensible defaults |
| daily-monitor.yml | GitHub Actions cron definition, Python setup, auto-commit step |
Four-strategy fallback chain
Sam's Club gets a dedicated code path. Every other retailer goes through a layered fallback that tries the most reliable extraction method first and degrades gracefully to regex if nothing structured is available.
Residential proxy routes around Kasada. The Next.js embedded JSON blob contains the canonical product graph: price, variants, instant savings coupons, and per-SKU stock status. Most reliable; used exclusively for samsclub.com.
Parses Schema.org
Product + Offer blocks. Reliable on retailers that correctly implement structured markup. First fallback for all non-Sam's URLs.Reads
og:price:amount, og:availability, and product meta tags. Lower fidelity than JSON-LD but broadly supported across major retail sites.Maintained selector maps for Amazon, Walmart, Target, Best Buy, Home Depot, and Lowe's — highest-confidence extraction when JSON-LD and meta fail or are absent.
Last-resort stock phrase detection from raw page text. Catches "In Stock," "Out of Stock," "Limited availability" patterns when structured data is unavailable.
Why it's built this way
No external database. No credentials to manage beyond the secrets already stored in GitHub Actions. Git history provides a full timestamped audit trail of every run's database state at zero added cost.
Non-technical users can add, remove, or update products by editing a spreadsheet. No deploys, no PRs, no code changes — the pipeline reads the sheet fresh on every run.
Each GitHub Actions execution is fully independent. All state lives in the database; the cron job reads it, writes to it, and commits it back. No in-memory state survives between runs by design.
Email clients strip external stylesheets and <style> tags. Every formatting rule is inlined at build time, ensuring the HTML digest renders correctly in Gmail, Outlook, and Apple Mail without degradation.
On cost: ScraperAPI free tier covers 5,000 credits/month. At ~140 credits per run, the system uses roughly 4,200 credits/month — comfortably under the limit. GitHub Actions free tier adds 2,000 minutes/month; a daily job running for a few minutes uses a small fraction. Total monthly infrastructure cost: $0.
What changed
| Dimension | Before | After |
|---|---|---|
| Price checks | Manual, browser, one tab per product | Automated — 8:30 AM, full list, inbox |
| Stock visibility | Single page load, no history | Per-variant status, 30-day SQLite trend |
| Coupon detection | Easy to miss — no alert | Auto-detected, badged in every report row |
| Price change alerts | None — no day-over-day comparison | ↑↓ arrows with exact delta, every row |
| OOS response time | Hours or days | Same morning, by 8:30 AM |
| Infrastructure cost | N/A | $0/month — fully within free tiers |
| Adding products | Open another browser tab | Edit the Google Sheet — next run picks it up |
Technologies used
No LLM agents in this system. This is a scripted automation pipeline — deterministic, fast, and purpose-built for high-frequency structured data extraction. AI was used to accelerate the build, not as a runtime component.