Sometimes the smallest bugs reveal the most about how AI systems work in production. We recently shipped a fix that solved a frustrating parsing issue in our intent decomposition pipeline—and it's a perfect example of the gap between what LLMs output and what our systems expect.
The Problem: When LLMs Get Too Helpful
Our intent decomposer is a critical component in Strug Works' orchestration layer. When a high-level directive comes in, the decomposer breaks it down into actionable tasks. It relies on structured JSON responses from our LLM to parse and route work correctly.
The issue? LLMs are trained on markdown-heavy data. When you ask for JSON, many models helpfully wrap it in markdown code fences—those triple backticks you see everywhere in documentation. So instead of clean JSON, we were getting responses like:
```json
{
"tasks": [...]
}
```Our JSON parser didn't know what to do with those fence markers. Result: parsing errors, failed decompositions, and missions that couldn't start.
The Fix: Strip First, Parse Second
The solution is straightforward: before we attempt to parse the LLM response as JSON, we now strip out any markdown code fences. This preprocessing step ensures we're always working with clean, parseable JSON regardless of how the model decides to format its output.
It's a small change, but it speaks to a larger principle: when you're building production AI systems, you need defensive parsing at every boundary. LLMs are powerful but unpredictable. Your infrastructure needs to absorb that variability without breaking.
Why It Matters
This fix improves reliability across the entire Strug Works orchestration pipeline. Every mission that goes through the Dispatcher now has a cleaner path from intent to execution. Fewer parsing errors means faster mission starts and less noise in our error logs.
For teams building on Strug Works, this is invisible—which is exactly how infrastructure improvements should feel. Your missions just work, and you don't have to think about the LLM formatting quirks we're handling under the hood.
What's Next
This fix is part of a broader effort to harden our LLM integration points. We're building out more robust schema validation for all agent-to-agent communication, and exploring structured output modes that bypass formatting issues entirely.
We're also improving our precommit validation gates to catch these kinds of formatting edge cases earlier in development. The goal: every component that touches LLM output should have tests covering common formatting variations, not just the happy path.
If you're building AI systems, take this as a reminder: always assume your LLM will surprise you. Build your parsing layers to be forgiving, your validation to be strict, and your error messages to be informative. That's how you go from prototype to production.