Why AI context degrades — and the architectural fix that actually works
-
Moe Hachem - March 30, 2026
Every team that works with AI long enough hits the same wall. The sessions start sharp. The model knows the project, produces relevant output, picks up where you left off. Then, somewhere around prompt 150 or 200, it starts drifting. You find yourself re-explaining decisions that were already made. The AI confidently contradicts something you established three weeks ago. You spend more time correcting than building.
Most people call this a model limitation. It isn’t. It’s an architectural failure — one you’re responsible for, which means it’s also one you can fix.
What’s actually happening
Large language models don’t have persistent memory between sessions. This is widely understood. What’s less understood is the practical consequence: every conversation starts from zero. The model doesn’t know your codebase, your product decisions, your team’s conventions, or any of the reasoning that got you to where you are. It knows only what you put in the current context window.
The naive fix is to put more context in. Paste in the relevant code. Add the design doc. Include the previous conversation. This doesn’t solve the problem — it delays it and makes it more expensive. You’re now consuming tokens on context management instead of actual work. And as the project grows, the amount of context you need to include grows with it, until you hit the window limit or the costs become prohibitive.
The real fix is structural. Instead of feeding the AI everything it might need, you give it a compact index that tells it where to find what it needs.
What SR-SI actually is
Simulated Recall via Shallow Indexing. The name is precise: the model doesn’t actually remember — it simulates recall by consulting a maintained index before every task.
The index is compact by design. It contains one line per file — path and purpose, not content. It records architectural decisions, patterns, business rules, and cross-component dependencies in compressed form. The AI reads the index first, then retrieves only the specific files relevant to the current task.
The practical effect: the model re-orients itself on demand rather than depending on conversational history that degrades over time.
The numbers
I’ve been measuring this across three real projects of increasing complexity, documented in a 50-page research paper published in February 2026.
The baseline — standard conversational workflow without indexing — established a consistent inefficiency ratio of approximately 2.56 tokens per line of code to maintain architectural awareness. Context drift began appearing reliably around 200 prompts.
With SR-SI applied mid-project on a codebase roughly twice the size of the baseline, the net token efficiency ratio dropped to 0.37 tokens per LOC — an 85.5% reduction. The project ran to 1,006+ prompts with zero context-loss events. Zero re-explanations of established architecture.
The modular variant, applied to a 66,475-line production codebase, reduced per-task context load from 15,641 tokens to approximately 1,645 — an 89.5% reduction per task. The cumulative improvement from unstructured conversational baseline to modular SR-SI is approximately 106x, measured by the Token Coherence Metric (lines of code coherently managed per net token consumed).
What this means in practice
The 106x figure is striking. The practical implication is more important: you can build a complex, multi-product system with a single developer and AI agents, across months of development, without ever having to re-explain an architectural decision. The index holds the structure. The AI reads the index. The conversation can go as long as the project needs it to.
This is the system that made building five production products in parallel possible for one person. Not raw AI capability. Architecture built around it.
SR-SI: The methodology that gives AI persistent memory across any long-running project