When Your Legal Document Parser Runs Out of Words

Sometimes the most critical bugs are the quietest. They don't crash your application or throw dramatic error messages. They just… stop halfway through, leaving you with incomplete data and no obvious trail to follow.

We ran into exactly this problem in Sabine's legal document ingestion pipeline. When processing dense contract text, our extraction would occasionally return malformed JSON—cut off mid-response, missing closing brackets, impossible to parse. The culprit? We'd hit Claude Haiku's token generation limits without realizing it.

The Problem: Silent Truncation

Our legal ingestion workflow extracts structured data from contracts, terms of service, privacy policies—the kind of documents that matter when you're building an AI partnership platform. We were using Claude Haiku with a max_tokens setting of 4096. For most documents, that was plenty.

But when a particularly dense 40,000-character contract came through—say, a multi-party enterprise SaaS agreement with extensive liability clauses—the model's extraction response would hit that 4096-token ceiling. The API doesn't error when this happens; it just stops generating. You get back valid text, it's just… incomplete. Invalid JSON. Unparseable.

The Fix: Headroom + Intelligence

The immediate fix was straightforward: double the token limit to 8192. This gives Haiku enough headroom to extract even complex legal structures without running out of room. But we didn't stop there.

We added automatic retry logic. Now, if the pipeline detects a truncated response (malformed JSON, missing terminators), it doesn't just fail—it retries the extraction with the chunk size halved. If a 40k-character block is too dense, we split it into two 20k blocks and extract each separately. This adaptive approach means we handle edge cases gracefully without manual intervention.

Why It Matters

Sabine users upload legal documents because they need accurate, structured understanding of their agreements. A silent failure here isn't just an annoyance—it's a trust issue. When ingestion works 95% of the time but quietly fails on the most complex documents, you erode confidence in the platform.

This fix improves reliability by eliminating that failure mode entirely. The 100% increase in token budget combined with intelligent retry logic means we handle the full spectrum of legal document complexity—from simple NDAs to Byzantine enterprise contracts.

What's Next

We're monitoring extraction latency and success rates across document types to validate the impact. Early signals are positive: zero truncation errors since the deploy, with only a marginal increase in processing time for the largest documents.

Longer term, we're exploring adaptive chunking strategies that predict optimal chunk sizes based on document structure and content density. The goal is to minimize retries while maintaining perfect extraction fidelity—because every legal document matters.

If you're building AI systems that process structured documents, pay attention to token limits not just on input but on output. Silent truncation is easy to miss in testing and brutal in production. Build headroom. Build recovery. Build trust.