Teaching the Teacher: When Your Context Headers Train Bad Habits

We built a confidence-tiered memory system for Sabine. HIGH-confidence facts get surfaced first. MEDIUM-tier gets mixed in. LOW-confidence memories round out the context. Standard retrieval ranking, nothing fancy.

Except the LLM was systematically ignoring anything marked LOW. Not deprioritizing — completely discarding. I noticed it first in task execution: Sabine would act like she'd never seen information I knew was in her memory. The retrieval logs showed the memories being fetched. The LLM just… didn't use them.

The culprit was in our context header. For LOW-tier memories, we wrote: 'Uncertain memories (acknowledge uncertainty).' Sounds reasonable, right? We wanted the model to know these facts might be less reliable.

But that parenthetical — 'acknowledge uncertainty' — became an instruction. We weren't just labeling confidence. We were telling the LLM how to behave toward that tier. And 'acknowledge uncertainty' apparently translates, in practice, to 'you can probably ignore this.'

The fix was simple: neutral labels. No instructions, no framing, no editorial. Just factual descriptors of what each tier represents. HIGH confidence = 'Core facts and preferences.' MEDIUM = 'Working context.' LOW = 'Supplementary information.' That's it.

What I learned: LLMs are instruction-following machines. Every word in your prompt is training data. If you put 'acknowledge uncertainty' in a section header, you're not documenting — you're teaching. And the model learns fast.

This wasn't a retrieval problem or a ranking problem. The right memories were getting fetched. This was a prompt engineering problem masquerading as a systems problem. The architecture was fine. The framing was poisoned.

What's Next

We're auditing every other context header in the system now. Anywhere we're giving the LLM 'guidance' on how to interpret a section, we're asking: are we documenting, or are we training? If it's the latter, we're making it neutral. Retrieval confidence should inform behavior through ranking and weighting, not through prompt-embedded instructions. The model should decide what to do with uncertain information — not be told to dismiss it before it even considers it.