Specialty Retail Brand — TJX/MarMaxx PO Processor Case Study

Specialty Retail Brand — Case Study

Automated retailer PO
processing & upload

How we replaced a manual, error-prone PO conversion workflow with a browser-based tool that parses TJX/HomeGoods/Marshalls PDFs, validates every line, and exports Extensiv-ready upload files in three steps — with a human review gate before anything touches the WMS.

Built forSpecialty Retail Brand

Retail channelsTJX · HomeGoods · Marshalls

StackReact · PDF.js · Extensiv API

StatusCore tool live — API integration next

By the numbers

Step workflow: upload, review, export

Extensiv template columns, fully populated

30+

DC locations parsed automatically

Manual reformatting steps

100%

Browser-based — no server, no install

45min → 3min

Per PO, start to Extensiv-ready export

The problem

45 minutes of manual work per PO — brittle, slow, and invisible to QC

Processing a single TJX purchase order took roughly 45 minutes from PDF to Extensiv-ready CSV. TJX Companies sends purchase orders as multi-page PDFs: a header page with styles, quantities, and unit costs, followed by a distribution section that breaks those quantities down by destination DC. To get those orders into Extensiv (formerly Skubana), someone has to manually cross-reference the two sections, build one row per SKU per DC, calculate order totals, and format 18 columns correctly every time.

Without a tool, this process lived in spreadsheets. DC addresses were typed by hand. Unit prices were looked up from the header and matched to each distribution row manually. Date fields were reformatted to survive Excel's CSV parser. Any mistake — a transposed quantity, a mismatched SKU, a dropped leading zero on a zip code — flowed directly into Extensiv and required a manual correction. The same PO now processes in under 3 minutes.

Before — ~45 minutes

Header costs manually matched to distribution rows for every SKU

DC addresses typed from PDF — no address lookup or validation

Date fields reformatted by hand and re-broken by Excel on open

Leading zeros on zip codes silently stripped by Excel CSV

No per-order total verification before upload

Multi-PO merges require copy-paste across separate spreadsheets

After — ~3 minutes

PDF parser extracts header and distribution sections; joins by Vendor Style automatically

DC addresses parsed from PO text with DC number appended to name

Dates stored as mm/dd/yyyy text — survive Excel open without reformatting

Zip codes quoted in CSV output — leading zeros always preserved

Total Amount Paid calculated and repeated across every row of each order

Multiple POs merged in one export session — single combined file

Architecture

Three steps, one human review gate, zero server

The tool runs entirely in the browser. PDF.js extracts text client-side; a parsing pipeline handles the layout variation across TJX, HomeGoods, and Marshalls documents; the result lands in an editable review table before anything is exported. The buyer never touches a spreadsheet until the file is done.

Input

User uploads one or more PO PDFs via drag-and-drop

Accepts main header PO and distribution/breakdown pages in any combination

↓

Step 1 — PDF Parser

PDF.js extracts raw text → layout-aware parser identifies header and distribution sections

Handles columnar DC layouts · SKU line-split repair · cancel-date lookahead · multi-PO merging

↓

Header extractor

Vendor Style → Item SKU + Unit Cost from header PO line items

Start Ship / Cancel dates · PO number · Order date

Distribution extractor

2-digit DC prefix → address + quantities per Vendor Style

DC name + number · Address split into 5 columns · Country code

↓ Joined by Vendor Style · Ship By Date = Cancel Date − 3 days

Step 2 — Review & Edit

Editable table — one row per SKU × DC · Buyer email editable · Zero-qty rows hidden

Human review gate before any data leaves the browser

↓ User confirms →

Step 3 — QC & Export

Per-order total verification → flagging → Extensiv-formatted CSV download

18-column template · Quoted zip codes · Named Skubana_PO_<Number>_Formatted.csv

PDF parsing challenges

Three layout bugs that required format-aware fixes

TJX/MarMaxx PO PDFs are not structured data exports — they're print-formatted documents extracted as raw text. The parser had to be hardened against three distinct layout failure modes discovered in production documents before the output was reliable.

PDF.js splitting SKU codes across lines

The text extractor sometimes broke an 8-character SKU like SKU-001 into two tokens on adjacent lines. The fix is a line-joining regex that recognizes partial SKU fragments and re-concatenates them before the parser runs — preventing phantom "unmatched style" errors on valid codes.

All HomeGoods DCs appearing on a single line

HomeGoods distribution pages use a columnar layout where every DC and its quantities land on one extracted text line rather than one per DC. The fix is positional array pairing — splitting the line on known delimiters and zipping DC identifiers with their corresponding quantity values by index position.

CONSOLIDATOR cancel date on the next line after its label

The Marshalls CONSOLIDATOR block puts the cancel date value on the line immediately following the "Cancel Date:" label rather than on the same line. A simple regex match returns empty. The fix is index-based lookahead: when the label is found, read line[index + 1] for the value — and Ship By Date is then calculated as cancel minus 3 calendar days.

Output preview

What the QC screen looks like before export

After the parser runs, the user lands on a review table with every parsed row editable inline. The QC panel runs three checks automatically: per-order total verification, missing unit price detection, and date field validation. The export button only activates once QC passes or the user explicitly overrides a flag.

PO Processor — Step 3: QC & Export · PO XXXXXX (TJX) + PO XXXXXX (Marshalls)

① Upload PDFs

② Review & Edit

③ QC & Export

Parsed order summary

PO XXXXXX (TJX)8 rows · 4 DCs · 3 SKUs

PO XXXXXX (Marshalls)6 rows · 3 DCs · 3 SKUs

Total export rows14

Order totals verification

Order Number	DC	SKU	Qty	Unit Price	Row Total	Order Total	Check
91-XXXXXX	Retailer DC [A]	SKU-001	24	[redacted]	[redacted]	[redacted]	✓ PASS
91-XXXXXX	Retailer DC [A]	SKU-002	24	[redacted]	[redacted]	[redacted]	✓ PASS
96-XXXXXX	Retailer DC [B]	SKU-001	12	[redacted]	[redacted]	[redacted]	✓ PASS
85-XXXXXX	Retailer DC [C]	SKU-003	18	[redacted]	[redacted]	[redacted]	✓ PASS

QC results — all checks passed

·All order totals verified 14/14 rows match

·No missing unit prices all SKUs matched

·All dates present and formatted mm/dd/yyyy ✓

·Zero-quantity rows excluded from export 2 rows suppressed

·Buyer email: [email protected] confirmed

Export

·Skubana_Merged_Apr2026.csv · 14 rows · ready to upload to Extensiv

DC + Retailer coverage

Distribution center logic baked into the parser

The parser maintains a lookup table mapping two-digit distribution prefixes to their retailer and DC identity. Order numbers are automatically formatted as XX-<HeaderPO>. DC names in the Ship To Name column always have the DC number appended, eliminating any ambiguity in the Extensiv order list.

Prefix range	Retailer	Notes
881–887, 890	HomeGoods	Columnar layout — positional array pairing required
891, 896–898	TJX	Standard distribution block format
6610	TJX Chino	4-digit prefix — special-cased in order number formatter
885–886	Marshalls	CONSOLIDATOR block — cancel date on next-line lookahead

Tech stack

Runs in the browser on infrastructure the brand already uses

The entire parsing and export pipeline is client-side. No upload, no server, no credentials at risk. The Extensiv API integration adds a thin Vercel serverless proxy as the only backend component — credentials stay server-side, the browser never sees them.

Layer	Tool	Notes
PDF extraction	PDF.js	Client-side text extraction · no file upload · works offline
UI framework	React	Three-step wizard · inline editable review table · warm peach/cream design system
Parsing logic	Vanilla JS	Layout-aware: columnar HomeGoods, line-split SKU repair, next-line date lookahead
Export format	CSV (quoted)	Zip codes quoted · dates as text · 18-column Extensiv template · named per spec
WMS	Extensiv (Skubana)	Order upload target · OAuth2 integration via Vercel proxy (Phase 2)
Hosting	Vercel	Static React app + serverless proxy for Extensiv API credentials

Roadmap

What gets built next

The v1 pipeline covers the full lifecycle from PDF upload through validated CSV export. The next phase connects the output directly to Extensiv via API, eliminating the manual upload step entirely — and expands the platform to support other brands and WMS systems as a SaaS offering.

Near term

Extensiv API integration — direct order push via Vercel serverless proxy

OAuth2 credential management — credentials never leave the server

Push confirmation in UI — Extensiv order ID returned and displayed

Error surfacing — Extensiv rejections shown inline with row-level context

Longer term (SaaS)

Multi-tenant adapter architecture — ShipStation, NetSuite, other WMS targets

Warehouse Invoice Auditor — AI-powered 3PL invoice reconciliation module

Brand onboarding + subscription tiers — SaaS pricing model

Additional retailer parsers — Target, Walmart, Nordstrom PO formats

Key design decisions

What shaped the architecture

Human review is a feature, not a gap

The tool deliberately stops at an editable review table before export. The buyer can correct a quantity, update the buyer email, or remove a row before anything reaches Extensiv. The parser handles the mechanical work; the human handles the judgment call. This also surfaces any parsing edge cases before they cause WMS problems.

Client-side parsing eliminates infrastructure and privacy risk

PO documents contain vendor cost data that should never leave the buyer's machine. Running PDF.js in the browser means no server upload, no logging, and no third-party data handling — the file stays local from drop to export. The only external call in Phase 2 is the Extensiv API push, which goes through a credential-holding Vercel proxy.

Zero-quantity row suppression is a WMS correctness requirement

Distribution pages list every DC in the TJX network even when a given PO allocates zero units to most of them. Exporting those rows creates phantom orders in Extensiv. The parser filters them before the review table renders — the buyer never sees noise, and the WMS never receives it.

Zip code quoting survives Excel without any user action

Excel silently strips leading zeros from numeric-looking CSV fields on open. Quoting zip codes as strings (“01234”) forces Excel to treat them as text. The fix is invisible to the user and permanent in the output — no reformatting step, no instructions to remember.

Total Amount Paid is order-level, not row-level

Extensiv expects the same order total repeated on every row that shares an order number — it's a flat file format constraint, not a summation row. The export layer groups rows by order number, calculates the sum once, then writes that value to every row in the group. QC verifies it before the download fires.

Fault-tolerant parsing returns partial data rather than failing silently

When the parser can't confidently extract a field — a missing cancel date, an unrecognized DC prefix, a split SKU it can't reassemble — it flags the row in the review table rather than skipping it or guessing. The buyer sees exactly what was uncertain and can correct it before export.