Why Your AI Agent's Responses Look Broken (And How We Fixed It)

Sometimes the most frustrating bugs are the ones that seem impossible until you understand the data structure. This week we shipped a fix for exactly that kind of issue in Sabine's agent streaming logic.

The Problem

When Sabine's agent uses Claude and makes tool calls, the response structure changes. Instead of getting plain text back, Claude returns AIMessage.content as a list of typed content blocks—something like [{"type": "text", "text": "actual response here"}]. Our stream response extraction code wasn't accounting for this, which meant after tool calls, the agent's responses would come through malformed or incomplete.

This wasn't immediately obvious. The agent worked fine for simple back-and-forth conversations. It only broke when the agent needed to use a tool—check memory, update a database, call an external API—and then respond with context from that tool call. That's when the content structure shifted and our extraction logic failed.

The Fix

The solution was to flatten those content blocks properly. We updated sabine_agent.py to detect when content comes back as a list of blocks and extract the text field from each block, then concatenate them into a single coherent response. It's a small change in the code but a significant improvement in reliability.

Now when the agent calls a tool and responds, the stream extraction correctly handles Claude's structured content format. No more broken responses. No more confusion about why an agent seemed to go silent after using a function.

Why This Matters

This is the kind of bug that's easy to miss in testing but breaks user trust in production. If an AI agent stops mid-conversation or returns garbled output, users assume the whole system is unreliable. The reality is that LLM APIs have quirks—different response formats depending on context, structured outputs that change shape based on what the model is doing—and robust agent systems need to handle all of those edge cases gracefully.

We're building Sabine and Strug Works to be production-grade platforms, which means sweating these details. Every bug we catch and fix makes the system more reliable for the next person who depends on it.

What's Next

This fix is part of a broader effort to harden Sabine's agent runtime. We're adding more comprehensive testing around tool calls and streaming responses, expanding our test coverage to include edge cases like multi-block content, image content blocks, and mixed-type responses. We're also improving observability so we can catch these issues earlier—before they hit production. Expect more updates as we continue to refine the agent execution layer.