West Atlas — Extensiv Sales ETL Pipeline Case Study

West Atlas LLC - Case Study

Replacing a fragile export
with a production sales pipeline

How I moved a daily sales dashboard off a manual Extensiv export and onto a direct Order Manager API pipeline - turning a 45-minute morning ritual into an hourly scheduled job that runs unattended, validates its own schema, and keeps the Power BI report fresh all day.

Built by West Atlas LLC

Use case CPG operations & sales reporting

Stack Python · Extensiv Order Manager API · SharePoint · Power BI

Status Production - running hourly

By the numbers

45 min

Daily task eliminated

Manual exports required

Scheduled job replaces it

Hourly

Refresh cadence

100%

Schema match with Power BI report

<2 min

End-to-end runtime

The problem

A fragile morning ritual, one sick day from failure

The team's Power BI sales dashboard depended on a manual workflow that had been running for years. Someone logged into Extensiv Order Manager every morning, ran a saved export to Excel, dropped the file into a SharePoint folder, and waited for Power BI to pick it up and refresh the report.

It worked - until it didn't. If the person was out, the dashboard didn't update. If Extensiv timed out mid-export, the file was corrupt. If the saved export filter drifted, the downstream schema broke silently and the report came out wrong. Leadership was depending on a dashboard that had a single human failure point buried at the top of its pipeline.

Manual workflow

One person logs into Extensiv each morning, runs the saved export, waits for Excel to finish, uploads to SharePoint. No backup. No run history. Silent failures when filters drift.

Automated workflow

Scheduled API pull, validated, delivered

Python hits the Extensiv Order Manager API on an hourly schedule, transforms the response into the exact Power BI schema, validates it, writes it to SharePoint, and logs the run. Zero humans required.

How it works

A four-stage pipeline, end-to-end in under two minutes

Scheduled trigger - hourly, unattended

A scheduled task fires every hour throughout the business day. No human involvement. Survives machine reboots via Windows Service registration.

scheduled-task

Data fetch - Extensiv Order Manager API

Authenticates with OAuth2 and pulls orders incrementally - only records modified since the last run. Paginated to survive high-volume days - post-promo Mondays that would have broken the old Excel export don't break this.

extensiv-apioauth2

Transform - normalize to Power BI schema

Python transforms the API response into the exact column layout, types, and channel mapping the downstream Power BI report expects. Every field is type-coerced and date-normalized. Schema is validated before the write, not after.

pythonschema-validation

Load - SharePoint write + run log

The output file lands in the same SharePoint location the manual export used to. Prior version is archived. Every run writes a log entry with timestamp, record count, and status. The existing Power BI report picks it up unchanged.

sharepointrun-log

Architecture

One job, one contract, zero downstream changes

Scheduled Task (hourly, unattended)
    │
    ▼
pipeline.py             ← entry point / orchestrator
├── auth.py             ← OAuth2 client credentials · token refresh
├── extract.py          ← Extensiv Order Manager API · incremental pull
│                        orders · line items · ship dates · channels
├── transform.py        ← normalize to Power BI schema
│                        type coercion · date normalization · channel mapping
├── validate.py         ← schema check before write
│                        column names · types · non-null fields · row count sanity
├── load.py             ← SharePoint write + prior-version archive
└── logger.py           ← run log · timestamp · record count · status
    │
    ▼
SharePoint → fixed location, identical schema
    │
    ▼
Existing Power BI report picks it up unchanged (zero downstream changes)

Module reference

What each file does

File	Role
pipeline.py	Orchestrates the full extract-transform-validate-load sequence end-to-end
auth.py	OAuth2 client credentials flow with automatic token refresh
extract.py	Paginated, incremental Extensiv Order Manager API pull with explicit field selection
transform.py	Normalizes API response into Power BI schema — type coercion, channel mapping, date handling
validate.py	Pre-write schema validation: column names, types, non-null checks, row-count sanity
load.py	SharePoint write with prior-version archive; preserves downstream file location exactly
logger.py	Structured run log with timestamp, record count, and pass/fail status per stage

Design decisions

Why it's built this way

Scope

Honor the downstream contract

The temptation was to rebuild the Power BI report from scratch. The better move was to match the exact schema the existing report expected - byte-compatible output, zero downstream changes. One week of work instead of three months.

Validation

Fail before you write

Schema validation runs before the SharePoint write, not after. Silent schema drift is the single most common failure mode in data pipelines. Better to fail loudly and leave the prior file in place than corrupt the downstream report.

Pagination

Build it in on day one

On a normal hour one API call is enough. On the hour after a promo drop it isn't. Pagination goes into the first version or it gets built under pressure in production at the worst possible time.

Observability

Run logs turn questions into one-liners

"Did the dashboard update this hour?" should take one second to answer, not ten minutes. Every run writes a structured log entry with timestamp, record count, and status. Failures alert immediately.

On the bigger pattern: If a human is clicking "export" every morning for a scheduled report, you have a single point of failure disguised as a routine. These are the easiest wins in any operations stack - high-impact, low-risk, invisible to the people consuming the output.

Before & after

What changed

Dimension	Before	After
Trigger	Human logs in, runs saved export	Scheduled task, hourly, unattended
Freshness	Once a day, if it ran at all	Hourly - dashboard stays current through the day
Time cost	~45 minutes of attention every morning	Zero - runs unattended
Failure mode	Silent - corrupt file, wrong schema, missed export	Loud - schema validation blocks bad writes, run log flags errors
Coverage	Breaks on sick days, holidays, PTO	Runs every day, including weekends
Volume handling	Excel export timed out on high-volume days	Paginated API pull, survives any day's volume
Audit trail	None - did the export run? no one knew	Structured run log with timestamp, count, status
Downstream impact	N/A	Zero - existing Power BI report unchanged

Stack

Technologies used

Deterministic ETL - no LLM agents. The job is structured data in, structured data out, schema-validated at the boundary. Python does the work, the scheduler handles the timing, and the existing Microsoft stack carries it from SharePoint into Power BI.

Python 3.12 Extensiv Order Manager API OAuth2 SharePoint Power BI Azure Function Windows Scheduled Task pandas requests pydantic