We shipped semantic memory retrieval this week. It's one of those changes that sounds simple but touches everything: how agents remember, how they prioritize context, and ultimately how useful they are when you need them to connect the dots.
The Problem: Keyword Matching Isn't Enough
Before this change, our memory system used confidence scores and recency to rank what agents should remember. That works fine when you're looking for exact matches or recent interactions. But it breaks down when concepts are related without sharing keywords. Ask about 'database performance' and miss memories about 'query optimization'—even though they're deeply connected.
What We Built
We added pgvector to our Postgres stack and ran migration 027 to add a vector(1536) column to agent_memory with IVF indexing. Every memory entry now gets embedded into a 1536-dimensional vector space using OpenAI's text-embedding-3-small model. When an agent queries memory, we calculate cosine similarity between the query embedding and stored memory vectors—then return results ranked by semantic relevance.
The critical detail: we kept the confidence-sort fallback. If semantic search doesn't return enough results (or if embeddings aren't available), the system falls back to the original confidence + recency ranking. This means agents never get stuck without context—they just get better context when it's available.
Implementation Notes
We updated agent_executor.py and memory_injector.py to handle the new vector retrieval path. Test coverage was added in test_memory_injector_pgvector.py to verify both the semantic search and fallback behavior. The IVF index keeps query times reasonable even as memory scales—though we're monitoring performance closely as usage grows.
Why It Matters
This changes how agents connect information. Instead of relying on exact phrase matches, they can now surface related concepts, similar problems, and relevant context even when the wording is completely different. For workflows that span days or weeks, this is the difference between agents that feel like they 'get it' and agents that make you repeat yourself.
What's Next
We're watching query latency and index performance as memory volume grows. Next steps include tuning the IVF parameters for optimal recall vs. speed tradeoffs, and potentially experimenting with hybrid ranking that blends semantic similarity with confidence scores instead of treating them as either/or. We're also considering batch embedding jobs to backfill vectors for existing memories that predate this migration. The foundation is solid—now we refine.