The memory problem

I used to think the problem with AI was intelligence.

That was the obvious story: better model, better reasoning, better output. Keep upgrading the engine and eventually the work gets easier.

Then I started using AI agents for real production work, not demos, and the problem showed up somewhere else.

The agent could write, refactor, and reason through a system with me. For a while, it felt like the future. Then the context window would reset, and suddenly we were doing orientation theater again.

Not starting from zero exactly; worse, in some ways.

I would paste the relevant files back in. Re-explain the architecture. Re-state decisions we had already made. Remind it why a certain path was rejected. Ten minutes of re-orientation before a single useful line of code. Sometimes more than once in the same day.

At first I treated that as normal overhead. Then I started measuring it.

Not emotionally, but in tokens, in time, in repeated setup cost, and in the compounding drag of working with a system that could reason well inside a session but could not reliably remember how the project worked across sessions.

The number that stopped me was simple:

A single task on the same codebase consumed 15,641 tokens using the normal approach. After I changed how the agent oriented itself - not the model, not the codebase, just the structure of what it consulted and when - the same kind of task dropped to 1,645 tokens.

Same codebase, same model, same kind of work.

That is a 9.5x improvement from discipline alone.

That was the moment the problem became obvious. The agent did not need a larger pile of context. It needed to know what to look for before it looked.

We had been treating AI memory like a storage problem: hold more, retrieve more, keep more in the window. Human memory does not work that way. You do not remember everything you have ever seen; you remember what matters, and you know what matters because you have built pathways back to it.

That is the part we were missing.

Memory is not just recall; it is orientation.

So I built the smallest structure I could: an index the agent reads before acting, and updates after acting. Not a giant document or a second-brain cosplay, but a shallow map of what matters, where it lives, and why it matters.

I called the methodology SR-SI: Simulated Recall via Shallow Indexing.

The rule was simple:

Before any action, consult the index; after any action, update it.

Never let the index become the project. It should only preserve enough structure for the right context to be reconstructed on demand.

That last part matters. Compression is not a limitation in SR-SI. It is the method.

The point is not to make AI remember everything. The point is to stop making it reload the world every time it needs to work.

SR-SI solved the immediate problem I had in my codebase. Then it started creating a more uncomfortable question:

If this works for AI agents, where else does the same memory failure show up?

That question is what this series is about.