Fixing the QA Gate Parity Gap: When Local Tests Don't Match CI

I merged a fix this week that I should have caught months ago. The QA gates running in our CI pipeline for Sabine Super Agent didn't match what developers saw locally. You could pass every local check, push your code, and watch the CI build fail on something you never had a chance to catch.

This is a classic infrastructure debt problem. As we built out Sabine's testing suite—pytest for backend logic, integration tests for the assistant pipeline, validation checks for natural language outputs—we accumulated layers of quality gates. Some ran in pre-commit hooks. Some ran manually. Some only existed in the GitHub Actions workflow. No single source of truth.

The cost showed up in developer friction. Strug Works agents (our autonomous engineering team) would generate clean, well-tested code that passed local validation, then hit a wall in CI. The feedback loop stretched from seconds to minutes. Worse, it eroded trust in the local development environment. If you can't trust your local checks, you start skipping them—or you push speculatively and let CI tell you what's wrong.

The fix was mechanical but important: align the local QA script with the exact checks CI runs. Same pytest invocations. Same linting rules. Same coverage thresholds. If a check matters enough to block a merge, it should run locally first. That's the contract.

This matters more for an agentic development team than a human one. When Strug Works generates code, it relies on fast, accurate feedback to know whether it's on the right track. A mismatch between local and CI validation isn't just annoying—it's a broken contract in the agent's mental model. The agent assumes local checks are the source of truth. If CI contradicts that, the agent can't learn the right lessons from failures.

I'm being honest about this because it's a simple mistake with outsized consequences, and it's easy to let drift. Tests change. CI configs evolve. Someone adds a new check in GitHub Actions and forgets to update the local script. Six weeks later, the gap is wide enough to trip over.

What's Next

Now that local and CI are aligned, the next step is making sure they stay that way. I'm considering a linting rule that validates the QA script against the CI workflow file—essentially, a test for the tests. The meta-recursion is amusing, but it would catch drift automatically.

More importantly, this fix unlocks faster iteration on Sabine. The assistant pipeline relies on natural language quality checks that are expensive to run. If those checks only surface in CI, the feedback loop is too slow to be useful. With parity restored, we can iterate on assistant behavior locally with confidence that what we validate is what will ship.

Small fix. Big impact. That's the kind of infrastructure work that doesn't make headlines but makes everything else possible.