Back to blog
EngineeringMar 25, 2026· min read

Smarter Contract Ingestion: Why Details Matter

We upgraded Sabine's legal document pipeline to extract contract numbers and customer information—a small change with big implications for partnership intelligence.

Legal agreements are dense. They're packed with clauses, exhibits, and cross-references that make extraction challenging. But buried in that complexity are a few critical pieces of metadata: contract numbers, customer names, addresses, and contact details. Until now, Sabine's legal ingest pipeline wasn't capturing them.

That changed with commit eac899a. We extended the legal document ingestion system to extract contract account numbers and full customer information—name, address, and phone—from uploaded agreements. It's a straightforward enhancement on the surface, but it unlocks meaningful improvements in how Sabine organizes and references partnership data.

What Changed

The legal ingest pipeline (lib/agent/ingest/legal.py) now parses and stores four additional fields when processing contract PDFs:

  • Contract Number: The unique account or agreement identifier
  • Customer Name: Legal entity or individual party to the agreement
  • Customer Address: Mailing or billing address
  • Customer Phone: Primary contact number

These fields are extracted using the same LLM-powered parsing infrastructure that handles party identification and obligation extraction. The difference is specificity: instead of general party detection, we're now targeting structured identifiers that make contracts searchable and referenceable.

Why It Matters

Contract metadata isn't glamorous, but it's foundational. Without it, every document is just another PDF in a pile. With it, you have:

  • Instant lookup: Reference a contract by account number instead of scrolling through filenames
  • Customer context: See who you're partnered with and how to reach them, surfaced automatically
  • Future integrations: These fields lay the groundwork for CRM sync, automated notifications, and cross-contract analysis

This is the kind of change that doesn't announce itself loudly but quietly makes the product more reliable. It's infrastructure that pays dividends every time someone uploads a new agreement.

What's Next

Now that we're capturing contract identifiers, the next logical step is making them actionable. We're exploring:

  • Search and filter: Let users query contracts by customer name or account number directly from Sabine's interface
  • Duplicate detection: Flag when the same contract number appears in multiple uploads
  • Relationship mapping: Connect contracts to customer entities for a holistic view of partnership history

Legal data is only as useful as the structure you impose on it. This update is one more step toward making Sabine's partnership intelligence genuinely intelligent.

Source: commit eac899a by strugcity in sabine-super-agent