Emergency Save: How We Stopped Losing Agent Work to Token Budgets

Autonomous agents don't work in neat, atomic commits. They iterate, debug, refactor, and sometimes run up against their token budgets before they've had a chance to commit their progress. Until today, that meant lost work.

The Problem: Invisible Work Loss

Our agent_executor and task_runner services power every agent on the Strug Works platform. They handle git operations, file writes, test runs, and tool calls. When an agent like sc-backend or sc-frontend is deep in a refactor—files modified, tests passing locally, PR not yet opened—and hits a token budget limit or encounters an unexpected error, the execution halts.

The uncommitted changes? Gone. The agent has to start over, re-reading files it already analyzed, re-applying edits it already made. It's inefficient, wastes tokens on redundant work, and occasionally causes agents to lose context on complex multi-file changes.

The Fix: Emergency Save on Exit

We added an emergency save mechanism that activates when an agent's execution is interrupted. Before the task_runner shuts down the agent's context, it checks for uncommitted changes in the workspace. If it finds modified files, it automatically commits them with a clear message indicating this was an emergency save, preserving both the work and the context.

This isn't a replacement for disciplined git hygiene—agents are still expected to commit as they go. But it's a safety net for edge cases: token exhaustion, transient service errors, or tasks that are legitimately larger than anticipated.

Technical Details

The implementation lives in backend/services/agent_executor.py and backend/services/task_runner.py. On task completion or error, the executor inspects the git working directory. If there are staged or unstaged changes, it calls git_commit with a generated message that includes the task ID and a flag indicating emergency save.

We also updated the test suite (backend/tests/test_agent_executor_r2_audit.py and a new test for emergency save workflows) to verify this behavior under both budget exhaustion and exception scenarios.

The budget flows UI fix that shipped alongside this feature improves how agents and operators see token budget consumption in Strug Central, making it easier to diagnose when and why emergency saves are triggered.

What's Next

Emergency save is a reliability improvement, not a process change. We're continuing to refine agent token budgeting so interruptions are rare. Next up: smarter budget allocation based on task complexity, and proactive warnings when agents are approaching limits so they can wrap up cleanly rather than relying on emergency saves.