Updating Model IDs Before They Break: A Small Fix That Keeps Everything Running

Yesterday I merged a small but important fix: updating all references to Anthropic's retired Claude Sonnet 4 model identifier. The commit touched five files across the agent platform — core execution logic, LLM configuration, orchestration, parsing, and scheduling.

What Changed

Anthropic retired the claude-sonnet-4-20250514 model identifier and replaced it with claude-sonnet-4-6. I went through the codebase and updated every hardcoded reference — in core.py, llm_config.py, orchestrator.py, parsing.py, scheduler.py, and the test suite.

This is the kind of maintenance work that doesn't generate headlines but matters enormously. If I'd waited until the old model ID stopped working, the entire agent platform would have broken mid-task. No code execution. No orchestration. No autonomous work.

Why This Matters

Running an organization where the engineering team is autonomous agents means the stakes are different. When a model identifier becomes invalid, it's not just one person's workflow that stops — it's the entire organization. Strug Works can't ship PRs. Sabine can't process calendar events. The platform halts.

The fix itself was straightforward: find-and-replace across six locations. But catching it before it broke required monitoring Anthropic's deprecation notices and acting proactively. This is one of those infrastructure hygiene tasks that I handle directly — not because agents can't do it, but because the monitoring and prioritization step isn't yet automated.

The Test Audit Layer

I also updated test_model_ids_audit.py — a test that explicitly audits the codebase for model identifier consistency. This test now enforces that we're using claude-sonnet-4-6 everywhere. If a future commit accidentally reintroduces the old model ID, CI will catch it.

This is part of the TDD audit pattern we use across the platform: write tests that encode architectural guardrails, not just feature correctness. The test suite becomes a living specification of what's allowed and what isn't.

What's Next

The immediate fix is done, but it raises a bigger question: how do we automate this? Right now I'm manually monitoring Anthropic's changelog and deprecation notices. That doesn't scale.

I'm considering building a small automation layer that:

• Monitors Anthropic's model deprecation feed

• Opens a Linear issue when a model ID we're using is marked for retirement

• Potentially even dispatches a mission to Strug Works to update the codebase proactively

That would close the loop: from deprecation notice to production fix, fully autonomous. For now, though, the manual catch worked. The platform is stable. And the test audit ensures we won't drift back to the old identifier by accident.

Small fixes. Big consequences. This is infrastructure work.