West Atlas — Sam's Club Listing Monitor Case Study
West Atlas LLC — Case Study

Automated retail
listing intelligence

How we replaced daily manual page-checking with a serverless Python pipeline that monitors Sam's Club listings every morning, bypasses bot protection, tracks price and stock history in SQLite, and delivers a formatted HTML digest to the inbox — at zero monthly cost.

Built by West Atlas LLC
Use case CPG procurement & reseller intelligence
Stack Python · GitHub Actions · ScraperAPI · SQLite · Gmail SMTP
Status Live — production runs confirmed
995
Lines of code — 6 modules
19
Products monitored daily
4
Layered scraping strategies
6
Retailers with custom selectors
$0
Monthly infrastructure cost
1 day
Build time — 20+ commits

Manual page-checking is slow, stateless, and easy to skip

Monitoring product listings at Sam's Club — tracking prices, variant availability, coupons, and stock status — was entirely manual. A buyer opens 19 browser tabs, scans each page, and tries to remember whether prices changed since yesterday. There's no history. No alerts. No audit trail.

On top of that, Sam's Club runs Kasada bot protection, which silently blocks any naive scraping attempt and returns a clean-looking page with no real data. A direct HTTP GET isn't enough — the protection layer has to be actively routed around.

Manual workflow
Open, scan, forget

One browser tab per product. No price history. Coupons easy to miss. Stock checks give a point-in-time snapshot only. OOS products surface hours or days late.

Automated workflow
8:30 AM — inbox, every day

19 products checked, scraped, diffed against yesterday, and delivered as a formatted HTML digest before the workday starts. Day-over-day price arrows. Per-variant stock. Coupon badges. Zero manual effort.

A five-stage pipeline, no server required

01

Cron trigger — GitHub Actions

A scheduled workflow fires daily at 8:30 AM EST. No server, no scheduler service. GitHub Actions is the entire execution environment — secrets stored in the repo settings, Python set up fresh on each run.

github-actions
02

Product list — Google Sheets CSV

The watchlist lives in a Google Sheet published as a CSV. sheets.py fetches and validates it with no API key required. Non-technical users can add or remove products by editing the sheet — no code change needed.

sheets.py
03

Bot-bypass scraping — ScraperAPI + layered fallback

Sam's Club URLs are routed through ScraperAPI's residential proxy pool, which bypasses Kasada. The page's embedded __NEXT_DATA__ JSON blob — the canonical source for price, variant maps, and coupon data — is parsed directly. Generic retailers fall through a four-strategy fallback chain.

scrapers.pyScraperAPI
04

History database — SQLite committed to repo

Every result is persisted to history.db. After each run, the updated DB is auto-committed back to the repository — Git history doubles as the audit log. No external storage service. No credentials to rotate. Day-over-day price deltas are computed at report time from the DB.

db.pysqlite
05

HTML email report — Gmail SMTP

The report is built with fully inline CSS — immune to email client stripping. Summary cards show counts of in-stock, OOS, price changes, and errors. Every product row shows a price delta arrow, coupon badge, and per-variant stock detail. Sent via Gmail App Password auth.

report.pygmail-smtp

Six modules, one cron, no infrastructure

GitHub Actions (cron: 8:30 AM EST daily)
    │
    ▼
monitor.py              ← entry point / orchestrator
├── sheets.py           ← fetch product list from Google Sheet CSV
├── scrapers.py         ← scrape price/stock/variants for each URL
│   ├── Sam's Club path  ScraperAPI → __NEXT_DATA__ JSON
│   │                    variant map · coupon · price range · per-variant stock
│   └── Generic path     JSON-LD → meta tags → CSS selectors → regex scanAmazon · Walmart · Target · Best Buy · Home Depot · Lowe's
├── db.py               ← SQLite schema · read/write · trend queries
├── report.py           ← HTML + plain-text email builder (inline CSS)
└── config.py           ← env var loader with validation and defaults
    │
    ▼
Gmail SMTP  inbox
    │
    ▼
git commit history.db → repo (Git history = audit log)

What each file does

File Role
monitor.pyOrchestrates the full pipeline end-to-end
scrapers.pyMulti-strategy price/stock extraction (456 lines); Sam's Club + 5 generic retailers
sheets.pyGoogle Sheets CSV fetch and column validation
db.pySQLite schema, read/write, day-over-day delta queries
report.pyHTML email builder with inline CSS + plain-text fallback
config.pyEnv var loader with type coercion, validation, and sensible defaults
daily-monitor.ymlGitHub Actions cron definition, Python setup, auto-commit step

Four-strategy fallback chain

Sam's Club gets a dedicated code path. Every other retailer goes through a layered fallback that tries the most reliable extraction method first and degrades gracefully to regex if nothing structured is available.

S1
Sam's Club — ScraperAPI + __NEXT_DATA__
Residential proxy routes around Kasada. The Next.js embedded JSON blob contains the canonical product graph: price, variants, instant savings coupons, and per-SKU stock status. Most reliable; used exclusively for samsclub.com.
F1
JSON-LD structured data
Parses Schema.org Product + Offer blocks. Reliable on retailers that correctly implement structured markup. First fallback for all non-Sam's URLs.
F2
Open Graph / meta tags
Reads og:price:amount, og:availability, and product meta tags. Lower fidelity than JSON-LD but broadly supported across major retail sites.
F3
Site-specific CSS selectors
Maintained selector maps for Amazon, Walmart, Target, Best Buy, Home Depot, and Lowe's — highest-confidence extraction when JSON-LD and meta fail or are absent.
F4
Regex text scan
Last-resort stock phrase detection from raw page text. Catches "In Stock," "Out of Stock," "Limited availability" patterns when structured data is unavailable.

Why it's built this way

Storage
SQLite committed to the repo

No external database. No credentials to manage beyond the secrets already stored in GitHub Actions. Git history provides a full timestamped audit trail of every run's database state at zero added cost.

Product management
Google Sheets as the CMS

Non-technical users can add, remove, or update products by editing a spreadsheet. No deploys, no PRs, no code changes — the pipeline reads the sheet fresh on every run.

Execution model
Fully stateless runs

Each GitHub Actions execution is fully independent. All state lives in the database; the cron job reads it, writes to it, and commits it back. No in-memory state survives between runs by design.

Email rendering
Inline CSS throughout

Email clients strip external stylesheets and <style> tags. Every formatting rule is inlined at build time, ensuring the HTML digest renders correctly in Gmail, Outlook, and Apple Mail without degradation.

On cost: ScraperAPI free tier covers 5,000 credits/month. At ~140 credits per run, the system uses roughly 4,200 credits/month — comfortably under the limit. GitHub Actions free tier adds 2,000 minutes/month; a daily job running for a few minutes uses a small fraction. Total monthly infrastructure cost: $0.

What changed

Dimension Before After
Price checks Manual, browser, one tab per product Automated — 8:30 AM, full list, inbox
Stock visibility Single page load, no history Per-variant status, 30-day SQLite trend
Coupon detection Easy to miss — no alert Auto-detected, badged in every report row
Price change alerts None — no day-over-day comparison ↑↓ arrows with exact delta, every row
OOS response time Hours or days Same morning, by 8:30 AM
Infrastructure cost N/A $0/month — fully within free tiers
Adding products Open another browser tab Edit the Google Sheet — next run picks it up

Technologies used

No LLM agents in this system. This is a scripted automation pipeline — deterministic, fast, and purpose-built for high-frequency structured data extraction. AI was used to accelerate the build, not as a runtime component.

Python 3.12 GitHub Actions ScraperAPI SQLite Gmail SMTP Google Sheets BeautifulSoup4 lxml requests