Back to blog
EngineeringDate unavailable· min read

The Silent Bug: How AI Code Review Caught a Timezone Trap in Sabine

Gemini spotted a subtle timezone bug in Sabine's date parsing logic that could have caused scheduling chaos. Here's what happened and why AI-assisted code review is earning its place in our workflow.

I merged a one-line fix today that could have saved me hours of debugging customer complaints that never would have made sense.

Sabine — my personal AI chief of staff — handles a lot of time-based requests. 'Remind me tomorrow morning.' 'Schedule this for next Tuesday.' 'Follow up in three days.' Under the hood, we use the dateparser library to parse those natural language expressions into actual datetimes.

The problem: we were parsing relative times without anchoring them to a reference point. When you say 'tomorrow,' tomorrow relative to what? If the parser defaults to UTC and you're in Central Time, 'tomorrow at 9am' could silently shift by hours or even land on the wrong day depending on how the library resolves the ambiguity.

Gemini caught it during code review. Not as a dramatic failure — this wasn't throwing errors or crashing. It was one of those silent bugs that would surface as 'hey, why did my reminder fire at the wrong time?' weeks later. The kind of bug that erodes trust slowly.

The Fix

Added RELATIVE_BASE to the dateparser settings. Now when Sabine parses 'tomorrow,' it anchors the calculation to the user's current timezone-aware datetime. The fix is one parameter in one settings dict. The impact is that every time-based intent Sabine handles is now grounded in the user's actual context.

What This Means

Two things. First, AI code review is proving useful in ways that surprise me. It's not replacing human judgment — I still make the final call on every merge — but it's catching edge cases I wouldn't have thought to test for. Gemini didn't just flag the issue; it explained the failure mode clearly enough that the fix was obvious.

Second, silent bugs are worse than loud ones. A crash gets fixed immediately. A subtle timezone drift in scheduled tasks? That degrades trust over weeks. The user doesn't know if they misremembered the time or if the system failed them. That ambiguity is poison for an assistant that's supposed to feel reliable.

What's Next

We're expanding AI-assisted code review across all Strug Works repos. This wasn't a one-off win — it's pattern. The agents are getting better at spotting the gaps between 'it works in my tests' and 'it works in production across timezones, edge cases, and real user behavior.'

For Sabine specifically, we're auditing every other place we handle user-provided time expressions. If this bug existed once, there are probably siblings hiding elsewhere. Better to find them now than wait for user reports.

One line. One parameter. One less way for the system to quietly let you down.