Specialty Retail Brand — TJX/MarMaxx PO Processor Case Study
Specialty Retail Brand — Case Study

Automated retailer PO
processing & upload

How we replaced a manual, error-prone PO conversion workflow with a browser-based tool that parses TJX/HomeGoods/Marshalls PDFs, validates every line, and exports Extensiv-ready upload files in three steps — with a human review gate before anything touches the WMS.

Built forSpecialty Retail Brand
Retail channelsTJX · HomeGoods · Marshalls
StackReact · PDF.js · Extensiv API
StatusCore tool live — API integration next
3
Step workflow: upload, review, export
18
Extensiv template columns, fully populated
30+
DC locations parsed automatically
0
Manual reformatting steps
100%
Browser-based — no server, no install
45min 3min
Per PO, start to Extensiv-ready export

45 minutes of manual work per PO — brittle, slow, and invisible to QC

Processing a single TJX purchase order took roughly 45 minutes from PDF to Extensiv-ready CSV. TJX Companies sends purchase orders as multi-page PDFs: a header page with styles, quantities, and unit costs, followed by a distribution section that breaks those quantities down by destination DC. To get those orders into Extensiv (formerly Skubana), someone has to manually cross-reference the two sections, build one row per SKU per DC, calculate order totals, and format 18 columns correctly every time.

Without a tool, this process lived in spreadsheets. DC addresses were typed by hand. Unit prices were looked up from the header and matched to each distribution row manually. Date fields were reformatted to survive Excel's CSV parser. Any mistake — a transposed quantity, a mismatched SKU, a dropped leading zero on a zip code — flowed directly into Extensiv and required a manual correction. The same PO now processes in under 3 minutes.

Before — ~45 minutes
Header costs manually matched to distribution rows for every SKU
DC addresses typed from PDF — no address lookup or validation
Date fields reformatted by hand and re-broken by Excel on open
Leading zeros on zip codes silently stripped by Excel CSV
No per-order total verification before upload
Multi-PO merges require copy-paste across separate spreadsheets
After — ~3 minutes
PDF parser extracts header and distribution sections; joins by Vendor Style automatically
DC addresses parsed from PO text with DC number appended to name
Dates stored as mm/dd/yyyy text — survive Excel open without reformatting
Zip codes quoted in CSV output — leading zeros always preserved
Total Amount Paid calculated and repeated across every row of each order
Multiple POs merged in one export session — single combined file

Three steps, one human review gate, zero server

The tool runs entirely in the browser. PDF.js extracts text client-side; a parsing pipeline handles the layout variation across TJX, HomeGoods, and Marshalls documents; the result lands in an editable review table before anything is exported. The buyer never touches a spreadsheet until the file is done.

Input
User uploads one or more PO PDFs via drag-and-drop
Accepts main header PO and distribution/breakdown pages in any combination
Step 1 — PDF Parser
PDF.js extracts raw text → layout-aware parser identifies header and distribution sections
Handles columnar DC layouts · SKU line-split repair · cancel-date lookahead · multi-PO merging
Header extractor
Vendor Style → Item SKU + Unit Cost from header PO line items
Start Ship / Cancel dates · PO number · Order date
Distribution extractor
2-digit DC prefix → address + quantities per Vendor Style
DC name + number · Address split into 5 columns · Country code
↓ Joined by Vendor Style · Ship By Date = Cancel Date − 3 days
Step 2 — Review & Edit
Editable table — one row per SKU × DC · Buyer email editable · Zero-qty rows hidden
Human review gate before any data leaves the browser
↓ User confirms →
Step 3 — QC & Export
Per-order total verification → flagging → Extensiv-formatted CSV download
18-column template · Quoted zip codes · Named Skubana_PO_<Number>_Formatted.csv

Three layout bugs that required format-aware fixes

TJX/MarMaxx PO PDFs are not structured data exports — they're print-formatted documents extracted as raw text. The parser had to be hardened against three distinct layout failure modes discovered in production documents before the output was reliable.

01
PDF.js splitting SKU codes across lines
The text extractor sometimes broke an 8-character SKU like SKU-001 into two tokens on adjacent lines. The fix is a line-joining regex that recognizes partial SKU fragments and re-concatenates them before the parser runs — preventing phantom "unmatched style" errors on valid codes.
02
All HomeGoods DCs appearing on a single line
HomeGoods distribution pages use a columnar layout where every DC and its quantities land on one extracted text line rather than one per DC. The fix is positional array pairing — splitting the line on known delimiters and zipping DC identifiers with their corresponding quantity values by index position.
03
CONSOLIDATOR cancel date on the next line after its label
The Marshalls CONSOLIDATOR block puts the cancel date value on the line immediately following the "Cancel Date:" label rather than on the same line. A simple regex match returns empty. The fix is index-based lookahead: when the label is found, read line[index + 1] for the value — and Ship By Date is then calculated as cancel minus 3 calendar days.

What the QC screen looks like before export

After the parser runs, the user lands on a review table with every parsed row editable inline. The QC panel runs three checks automatically: per-order total verification, missing unit price detection, and date field validation. The export button only activates once QC passes or the user explicitly overrides a flag.

PO Processor — Step 3: QC & Export  ·  PO XXXXXX (TJX) + PO XXXXXX (Marshalls)
① Upload PDFs
② Review & Edit
③ QC & Export
Parsed order summary
PO XXXXXX (TJX)8 rows · 4 DCs · 3 SKUs
PO XXXXXX (Marshalls)6 rows · 3 DCs · 3 SKUs
Total export rows14
Order totals verification
Order NumberDCSKUQtyUnit PriceRow TotalOrder TotalCheck
91-XXXXXXRetailer DC [A]SKU-00124[redacted][redacted][redacted]✓ PASS
91-XXXXXXRetailer DC [A]SKU-00224[redacted][redacted][redacted]✓ PASS
96-XXXXXXRetailer DC [B]SKU-00112[redacted][redacted][redacted]✓ PASS
85-XXXXXXRetailer DC [C]SKU-00318[redacted][redacted][redacted]✓ PASS
QC results — all checks passed
·All order totals verified  14/14 rows match
·No missing unit prices  all SKUs matched
·All dates present and formatted mm/dd/yyyy  
·Zero-quantity rows excluded from export  2 rows suppressed
·Buyer email: [email protected]  confirmed
Export
·Skubana_Merged_Apr2026.csv  ·  14 rows  ·  ready to upload to Extensiv

Distribution center logic baked into the parser

The parser maintains a lookup table mapping two-digit distribution prefixes to their retailer and DC identity. Order numbers are automatically formatted as XX-<HeaderPO>. DC names in the Ship To Name column always have the DC number appended, eliminating any ambiguity in the Extensiv order list.

Prefix rangeRetailerNotes
881–887, 890HomeGoodsColumnar layout — positional array pairing required
891, 896–898TJXStandard distribution block format
6610TJX Chino4-digit prefix — special-cased in order number formatter
885–886MarshallsCONSOLIDATOR block — cancel date on next-line lookahead

Runs in the browser on infrastructure the brand already uses

The entire parsing and export pipeline is client-side. No upload, no server, no credentials at risk. The Extensiv API integration adds a thin Vercel serverless proxy as the only backend component — credentials stay server-side, the browser never sees them.

LayerToolNotes
PDF extractionPDF.jsClient-side text extraction · no file upload · works offline
UI frameworkReactThree-step wizard · inline editable review table · warm peach/cream design system
Parsing logicVanilla JSLayout-aware: columnar HomeGoods, line-split SKU repair, next-line date lookahead
Export formatCSV (quoted)Zip codes quoted · dates as text · 18-column Extensiv template · named per spec
WMSExtensiv (Skubana)Order upload target · OAuth2 integration via Vercel proxy (Phase 2)
HostingVercelStatic React app + serverless proxy for Extensiv API credentials

What gets built next

The v1 pipeline covers the full lifecycle from PDF upload through validated CSV export. The next phase connects the output directly to Extensiv via API, eliminating the manual upload step entirely — and expands the platform to support other brands and WMS systems as a SaaS offering.

Near term
Extensiv API integration — direct order push via Vercel serverless proxy
OAuth2 credential management — credentials never leave the server
Push confirmation in UI — Extensiv order ID returned and displayed
Error surfacing — Extensiv rejections shown inline with row-level context
Longer term (SaaS)
Multi-tenant adapter architecture — ShipStation, NetSuite, other WMS targets
Warehouse Invoice Auditor — AI-powered 3PL invoice reconciliation module
Brand onboarding + subscription tiers — SaaS pricing model
Additional retailer parsers — Target, Walmart, Nordstrom PO formats

What shaped the architecture

01
Human review is a feature, not a gap
The tool deliberately stops at an editable review table before export. The buyer can correct a quantity, update the buyer email, or remove a row before anything reaches Extensiv. The parser handles the mechanical work; the human handles the judgment call. This also surfaces any parsing edge cases before they cause WMS problems.
02
Client-side parsing eliminates infrastructure and privacy risk
PO documents contain vendor cost data that should never leave the buyer's machine. Running PDF.js in the browser means no server upload, no logging, and no third-party data handling — the file stays local from drop to export. The only external call in Phase 2 is the Extensiv API push, which goes through a credential-holding Vercel proxy.
03
Zero-quantity row suppression is a WMS correctness requirement
Distribution pages list every DC in the TJX network even when a given PO allocates zero units to most of them. Exporting those rows creates phantom orders in Extensiv. The parser filters them before the review table renders — the buyer never sees noise, and the WMS never receives it.
04
Zip code quoting survives Excel without any user action
Excel silently strips leading zeros from numeric-looking CSV fields on open. Quoting zip codes as strings (“01234”) forces Excel to treat them as text. The fix is invisible to the user and permanent in the output — no reformatting step, no instructions to remember.
05
Total Amount Paid is order-level, not row-level
Extensiv expects the same order total repeated on every row that shares an order number — it's a flat file format constraint, not a summation row. The export layer groups rows by order number, calculates the sum once, then writes that value to every row in the group. QC verifies it before the download fires.
06
Fault-tolerant parsing returns partial data rather than failing silently
When the parser can't confidently extract a field — a missing cancel date, an unrecognized DC prefix, a split SKU it can't reassemble — it flags the row in the review table rather than skipping it or guessing. The buyer sees exactly what was uncertain and can correct it before export.