We shipped voice input to Sabine this week. Not as a gimmick, not as a feature checkbox—but because typing isn't always the fastest way to think out loud with your AI partner.
What Shipped
Phase 1 voice capabilities are now live in Sabine. We integrated Groq Whisper for transcription, built a usePTT React hook to manage push-to-talk state and audio capture, and designed a VoiceButton component that lives right in the chat interface. The integration touches ChatInput and ChatThread—anywhere you'd normally type, you can now speak.
Two modes: push-to-talk for quick thoughts, continuous dictation for longer-form input. Groq's Whisper endpoint handles the heavy lifting on transcription speed and accuracy. The hook manages microphone permissions, audio stream lifecycle, and error states. The button gives real-time visual feedback—recording, processing, transcribing.
Why It Matters
Voice changes the conversational dynamic. When you're brainstorming, troubleshooting, or working through a problem with Sabine, speaking is often faster and more natural than typing. It lowers friction. You're not composing—you're thinking out loud.
For technical founders juggling context across product, code, and team—this is about speed of thought. You shouldn't have to slow down to type when you're in flow. Voice input meets you where your brain already is.
This also sets the foundation for richer interaction patterns down the line. Once voice input is solid, voice output, ambient listening, and multimodal context become possible. But we're shipping in phases—input first, get it right, then build forward.
What's Next
Phase 2 is voice output—Sabine talking back. We're evaluating TTS providers for latency, naturalness, and cost. The goal is seamless back-and-forth: you speak, Sabine responds audibly, and the conversation flows without breaking stride.
Beyond that: ambient mode (always-listening context awareness), speaker identification for team conversations, and integration with Strug Works task orchestration so you can dispatch work verbally. Voice isn't just an input method—it's a new interaction model.
We're also collecting feedback on transcription accuracy, UI responsiveness, and edge cases (background noise, accents, multi-language). If you're using Sabine and trying voice input, let us know what works and what doesn't. We're iterating fast.
Voice is live. Try it. Tell us what you think. We're just getting started.