West Atlas — Sam's Club Listing Monitor Case Study

West Atlas LLC — Case Study

Automated retail
listing intelligence

How we replaced daily manual page-checking with a serverless Python pipeline that monitors Sam's Club listings every morning, bypasses bot protection, tracks price and stock history in SQLite, and delivers a formatted HTML digest to the inbox — at zero monthly cost.

Built by West Atlas LLC

Use case CPG procurement & reseller intelligence

Stack Python · GitHub Actions · scraping proxy tool · SQLite · Gmail SMTP

Status Live — production runs confirmed

By the numbers

995

Lines of code — 6 modules

Products monitored daily

Layered scraping strategies

Retailers with custom selectors

Monthly infrastructure cost

1 day

Build time — 20+ commits

The problem

Manual page-checking is slow, stateless, and easy to skip

Monitoring product listings at Sam's Club — tracking prices, variant availability, coupons, and stock status — was entirely manual. A buyer opens 19 browser tabs, scans each page, and tries to remember whether prices changed since yesterday. There's no history. No alerts. No audit trail.

On top of that, Sam's Club runs Kasada bot protection, which silently blocks any naive scraping attempt and returns a clean-looking page with no real data. A direct HTTP GET isn't enough — the protection layer has to be actively routed around.

Manual workflow

Open, scan, forget

One browser tab per product. No price history. Coupons easy to miss. Stock checks give a point-in-time snapshot only. OOS products surface hours or days late.

Automated workflow

8:30 AM — inbox, every day

19 products checked, scraped, diffed against yesterday, and delivered as a formatted HTML digest before the workday starts. Day-over-day price arrows. Per-variant stock. Coupon badges. Zero manual effort.

How it works

A five-stage pipeline, no server required

Cron trigger — GitHub Actions

A scheduled workflow fires daily at 8:30 AM EST. No server, no scheduler service. GitHub Actions is the entire execution environment — secrets stored in the repo settings, Python set up fresh on each run.

github-actions

Product list — Google Sheets CSV

The watchlist lives in a Google Sheet published as a CSV. sheets.py fetches and validates it with no API key required. Non-technical users can add or remove products by editing the sheet — no code change needed.

sheets.py

Bot-bypass scraping — proxy tool + layered fallback

Sam's Club URLs are routed through a residential proxy scraping tool, which bypasses Kasada. The page's embedded __NEXT_DATA__ JSON blob — the canonical source for price, variant maps, and coupon data — is parsed directly. Generic retailers fall through a four-strategy fallback chain.

scrapers.pyscraping proxy

History database — SQLite committed to repo

Every result is persisted to history.db. After each run, the updated DB is auto-committed back to the repository — Git history doubles as the audit log. No external storage service. No credentials to rotate. Day-over-day price deltas are computed at report time from the DB.

db.pysqlite

HTML email report — Gmail SMTP

The report is built with fully inline CSS — immune to email client stripping. Summary cards show counts of in-stock, OOS, price changes, and errors. Every product row shows a price delta arrow, coupon badge, and per-variant stock detail. Sent via Gmail App Password auth.

report.pygmail-smtp

Architecture

Six modules, one cron, no infrastructure

GitHub Actions (cron: 8:30 AM EST daily)
    │
    ▼
monitor.py              ← entry point / orchestrator
├── sheets.py           ← fetch product list from Google Sheet CSV
├── scrapers.py         ← scrape price/stock/variants for each URL
│   ├── Sam's Club path  scraping proxy → __NEXT_DATA__ JSON
│   │                    variant map · coupon · price range · per-variant stock
│   └── Generic path     JSON-LD → meta tags → CSS selectors → regex scan
│                        Amazon · Walmart · Target · Best Buy · Home Depot · Lowe's
├── db.py               ← SQLite schema · read/write · trend queries
├── report.py           ← HTML + plain-text email builder (inline CSS)
└── config.py           ← env var loader with validation and defaults
    │
    ▼
Gmail SMTP → inbox
    │
    ▼
git commit history.db → repo (Git history = audit log)

Module reference

What each file does

File	Role
monitor.py	Orchestrates the full pipeline end-to-end
scrapers.py	Multi-strategy price/stock extraction (456 lines); Sam's Club + 5 generic retailers
sheets.py	Google Sheets CSV fetch and column validation
db.py	SQLite schema, read/write, day-over-day delta queries
report.py	HTML email builder with inline CSS + plain-text fallback
config.py	Env var loader with type coercion, validation, and sensible defaults
daily-monitor.yml	GitHub Actions cron definition, Python setup, auto-commit step

Scraping design

Four-strategy fallback chain

Sam's Club gets a dedicated code path. Every other retailer goes through a layered fallback that tries the most reliable extraction method first and degrades gracefully to regex if nothing structured is available.

Sam's Club — proxy tool + __NEXT_DATA__
Residential proxy routes around Kasada. The Next.js embedded JSON blob contains the canonical product graph: price, variants, instant savings coupons, and per-SKU stock status. Most reliable; used exclusively for samsclub.com.

JSON-LD structured data
Parses Schema.org Product + Offer blocks. Reliable on retailers that correctly implement structured markup. First fallback for all non-Sam's URLs.

Open Graph / meta tags
Reads og:price:amount, og:availability, and product meta tags. Lower fidelity than JSON-LD but broadly supported across major retail sites.

Site-specific CSS selectors
Maintained selector maps for Amazon, Walmart, Target, Best Buy, Home Depot, and Lowe's — highest-confidence extraction when JSON-LD and meta fail or are absent.

Regex text scan
Last-resort stock phrase detection from raw page text. Catches "In Stock," "Out of Stock," "Limited availability" patterns when structured data is unavailable.

Design decisions

Why it's built this way

Storage

SQLite committed to the repo

No external database. No credentials to manage beyond the secrets already stored in GitHub Actions. Git history provides a full timestamped audit trail of every run's database state at zero added cost.

Product management

Google Sheets as the CMS

Non-technical users can add, remove, or update products by editing a spreadsheet. No deploys, no PRs, no code changes — the pipeline reads the sheet fresh on every run.

Execution model

Fully stateless runs

Each GitHub Actions execution is fully independent. All state lives in the database; the cron job reads it, writes to it, and commits it back. No in-memory state survives between runs by design.

Email rendering

Inline CSS throughout

Email clients strip external stylesheets and <style> tags. Every formatting rule is inlined at build time, ensuring the HTML digest renders correctly in Gmail, Outlook, and Apple Mail without degradation.

On cost: The scraping proxy's free tier comfortably covers the monthly run volume — well within the no-cost threshold. GitHub Actions free tier adds 2,000 minutes/month; a daily job running for a few minutes uses a small fraction. Total monthly infrastructure cost: $0.

Before & after

What changed

Dimension	Before	After
Price checks	Manual, browser, one tab per product	Automated — 8:30 AM, full list, inbox
Stock visibility	Single page load, no history	Per-variant status, 30-day SQLite trend
Coupon detection	Easy to miss — no alert	Auto-detected, badged in every report row
Price change alerts	None — no day-over-day comparison	↑↓ arrows with exact delta, every row
OOS response time	Hours or days	Same morning, by 8:30 AM
Infrastructure cost	N/A	$0/month — fully within free tiers
Adding products	Open another browser tab	Edit the Google Sheet — next run picks it up

Stack

Technologies used

No LLM agents in this system. This is a scripted automation pipeline — deterministic, fast, and purpose-built for high-frequency structured data extraction. AI was used to accelerate the build, not as a runtime component.

Python 3.12 GitHub Actions scraping proxy SQLite Gmail SMTP Google Sheets BeautifulSoup4 lxml requests