PII Never Leaves the Building

Some AI workflows are bounded by a hard rule: sensitive data cannot leave on-prem infrastructure. That constraint isn't a problem to design around — it's a clarifying force. Here is the pattern we keep landing on, and the discipline of cutting every lane we can't justify with real data.

Some AI work is bounded by a hard rule: certain data cannot leave the building. Not to a frontier-model API, not to a cloud OCR service, not to a third-party vector database. The whole pipeline runs on hardware the client controls, or it doesn't run at all. Treated as a problem, that's annoying. Treated as a design input, it's the most useful one you'll get — it rules out the patterns you might otherwise reach for by reflex, so the patterns that survive have already paid for themselves.

Three moves under the boundary

The shape of the discipline

Inside the boundary, three moves in order. Cut what you can't justify. Inherit what already works. Anchor the whole thing to a human who can be wrong on purpose.

MOVE 1 Cut the lane you're not using

First instinct on a document pipeline: write a path for every input shape — scans, photos, faxes, the whole zoo. Second instinct, after asking the operator what they actually receive: 100% digital PDFs with a real text layer. The OCR lane was theoretical. Cut it. The inspector becomes a single-path gate — digital advances, anything else quarantines for human triage. Day a scan actually arrives, add the lane back or push OCR upstream into intake. Until then, YAGNI.

MOVE 2 Inherit, don't reinvent

Inside the boundary, the client already runs a hardened redactor — fourteen recognizers tuned against real production traffic, a precision-first threshold, a deny-list seeded from their own staff table. We considered designing our own. Discarded the idea inside an hour. Wrap what works. The middle is solved; the seams — orchestration, audit, the bridge to the human reviewer — are where we earn our keep.

MOVE 3 Keep the human gate. Mandatory.

Every document passes through a licensed reviewer before release. Not a regression — the load-bearing trust contract of the whole system. The reviewer sees the redacted PDF next to a span ledger (every redaction, every recognizer, every confidence score) and approves, requests more redactions, or releases an over-redaction. The pipeline's job is to shorten their per-document time, not eliminate them. Auto-approve is banned.

What it looks like

On-prem pipeline · 8 phases · 11 nodes

Phase 1 · INTAKE
   📁 Local Drive · PDF Inbox
                 │
Phase 2 · INSPECT
   🔍 Inspect (PyMuPDF)            has text layer? page count?
                 │
   🚦 Gate · digital-only          digital → next
                 │                  scan ╌╌► quarantine ⚠
Phase 3 · EXTRACT
   📄 Docling (digital)            text + tree + tables
                 │
Phase 4 · STRUCTURE
   🤖 Classify + MD skeleton       local LLM · on-prem
                 │
Phase 5 · DETECT     ◄────────────╮  (request more redactions)
   🛡️  Detect PII                   inherited recognizers
                 │                  precision-first threshold
Phase 6 · REDACT
   ✂️  In-place patch + verify      text-extract audit
                 │ ╌╌► detect       3 strikes → quarantine
Phase 7 · EMIT
   💾 Sanitized MD + redacted PDF   atomic write + audit row
                 │
   🧑‍⚖️  Staff review · CPA / EA      ≤30 min in season
                 │                  decision → signal row
Phase 8 · END
   🏁 Done — downstream reads the sanitized markdown

Linear with one feedback loop (verify-fail → detect). The dashed return from review back to detection is what makes this "agentic" rather than a pipeline — the human gate can send work back, not just forward.

The cuts

Two design decisions worth naming, both counterintuitive to anyone who reads too many architecture diagrams.

What we cut · and why

Component	Why it looked good	Why it's gone	Revisit when
OCR scan lane	Handle any input shape	Intake is 100% digital. Lane unused.	A scan actually arrives
LLM arbiter	Triage low-confidence spans for the reviewer	No data yet that recognizers leave enough noise to be worth a model	Review time > SLA or miss-rate > floor
2nd-pass OCR verify	Catch PII hiding in raster pixels	Digital-only intake — no pixels for PII to hide in	OCR lane comes back

Each cut is documented in the spec with an explicit revisit trigger. Deleted-with-receipts beats deleted-and-forgotten.

Every line of a system is a tax on the system. The discipline isn't writing the line — it's deleting the line you can't justify with a number, even when it sounds responsible.

Why the boundary helps

The frame I'd push back on is "on-prem is the compromise we make for compliance." It is not — it's a forcing function that prevents three failure modes we see often in cloud-first projects.

Blocks scope creep into the model layer. When you can't call a frontier API mid-pipeline, you stop trying to. Deterministic code stays deterministic. The model is reserved for the one or two steps where pattern-matching beats rules — and the model is whatever runs on hardware you control, which is usually small. Smaller models force tighter prompts. Tighter prompts force clearer thinking.

Surfaces what you actually need from the human. You can't punt the hard cases to a cloud arbiter you don't own. The hard cases land on the reviewer's desk — so the rest of the system gets designed to respect their time: rank the queue, surface the rationale, default to under-redaction so their instinct (catch the miss) matches the system's.

Makes the audit story trivial. Every byte of PII lives on a machine the client owns. Every reviewer decision lives in a row in their database. The compliance conversation collapses into "show us the audit table" — and the table is there, because the pipeline can't work without it.

The pattern travels

Specifics are domain-bound — tax forms, a specific redactor, a specific reviewer role. The shape is general. Anywhere data residency is non-negotiable, the design collapses to the same three moves: cut the lane you don't need, inherit the part that works, keep the human gate. Anywhere it doesn't, the same discipline is still good engineering — the constraint just isn't doing the work of forcing it on you.

PII never leaves the building.

Three moves under the boundary

What it looks like

The cuts

Why the boundary helps

The pattern travels