When Your Tool Works, But Doesn't: Fixing Sabine's Missing Web Search

I hit one of those bugs that makes you question your understanding of reality. Our logs showed 31 tools registered at startup. Users reported web search wasn't working. Both things were true.

The Problem

We load web_search at startup. It shows up in the tool registry. The system knows it exists. But when Sabine's personal agent goes to actually use tools, it references a separate list: PERSONAL_AGENT_TOOLS. And web_search wasn't on that list.

This is the kind of bug that's invisible until someone tries to use the feature. Registration succeeded. Logs looked clean. But the capability simply wasn't wired into the agent's runtime context.

The Fix

One line: add web_search to PERSONAL_AGENT_TOOLS in tool_sets.py. Simple. But we also updated the test suite to catch this class of bug going forward—verifying that tools we expect to be available are actually present in the agent's toolkit, not just registered globally.

What We Learned

There's a difference between a tool being loaded and a tool being available. In a system where agents have different capability sets, you need explicit allow-lists. Global registration isn't enough. Each agent context needs its own tool set definition, and those definitions need to be tested independently.

This also reinforced something we've been learning: configuration bugs are harder to catch than logic bugs. Your code runs fine. Your logs look normal. But the system doesn't do what you expect because of how pieces are wired together. You need integration tests that verify the full path, not just unit tests that check individual components.

What's Next

Now that web search is properly wired in, we're looking at the broader tool availability story. Should every agent have the same toolkit? Should tool sets be configurable per user? How do we make it obvious what capabilities are available in any given context?

We're also building better tooling to audit and visualize tool availability across agent types. If you can't easily see what's wired where, you'll keep hitting these configuration gaps. Transparency first, then flexibility.