The 200-Prompt Wall
-
Moe Hachem - December 10, 2025
I’ve spent the better part of this year building prototypes with AI assistance. Three production projects, each with increasing complexity, each following the same pattern.
The first fifty prompts feel magical. The AI understands context, builds features that integrate cleanly, maintains architectural consistency without constant reminders. You start to feel like you’re working with a genuinely collaborative partner who remembers what you built together yesterday.
By prompt one hundred fifty, small cracks appear. Minor inconsistencies. Architectural drift that’s almost imperceptible at first. The AI still sounds confident, still produces working code, but something has shifted.
At prompt two hundred, the system breaks down entirely.
“Where’s the authentication module again?”
“That component already exists in /core/auth. I showed you this three times today.”
“Why are you creating duplicate code when we have a perfectly functional implementation?”
I found myself re-explaining the same architectural decisions over and over. The AI would sound confident, give fluent responses, act like it understood the full picture. But it had fundamentally forgotten everything we built together over the past week.
The Cognitive Mirage
I call this phenomenon the Cognitive Mirage. AI’s conversational fluency creates a powerful illusion of persistent memory and understanding. The responses are articulate, the explanations sound thoughtful, the code suggestions seem contextually aware. But mechanically, the system is regenerating from statistical patterns, not recalling our shared architectural history.
This isn’t a bug in a specific tool. It’s a fundamental limitation of how Large Language Models maintain context through finite token windows.
In one of my projects, I tracked the cost of this context loss carefully. Approximately eight hundred prompts were spent re-teaching concepts I had already established. Hours of development time consumed not by building new features or solving novel problems, but by context management—repeatedly pasting file paths, re-explaining component locations, reconstructing architectural decisions the AI had “known” just days before.
The conversations would go like this:
Me: “Use the existing authentication system to protect this route.”
AI: “I’ll create an authentication middleware for you.”
Me: “No, we already have AuthManager in /core/auth. Use that.”
AI: “Got it, I’ll integrate with AuthManager.”
Three prompts later:
AI: “I’ve created a new authentication handler…”
Me: “Stop. AuthManager. It exists. Use it.”
This loop wasn’t malice. It was context truncation.
The Obvious Solutions Don’t Work
I tried everything.
Longer context windows? They just delay the collapse. The window eventually fills.
Better prompt engineering? Helps early, fails late.
Manual documentation? Turns the human into the AI’s memory management system—an expensive, brittle process.
Nothing solved the root issue: LLMs cannot maintain project coherence through conversation history alone.
The Realization
Humans don’t remember everything. We remember where to look.
LLMs don’t recall—they regenerate. Without external structure, they drift.
So instead of forcing the AI to “remember,” what if we gave it the scaffolding to reconstruct context on demand?
That insight became the foundation for Simulated Recall via Shallow Indexing (SR-SI)—a methodology that delivered a 7× improvement in token efficiency and extended coherent collaboration from ~200 prompts to over 1,000.
What Changed
With SR-SI in place, the workflow shifted from chaotic to stable.
Me: “Add two-factor authentication to the dashboard.”
AI: [Checks index]
“Authentication is handled by AuthManager in /core/auth, which uses JWT sessions. I’ll extend AuthManager to include TOTP-based two-factor authentication while maintaining the current session flow.”
No drift. No re-teaching. No duplication.
In my most recent project—35,000+ lines of code, over 1,000 prompts—I experienced zero context-loss events. The AI maintained coherent understanding across weeks of work not by remembering, but by reorienting through the index.
The result:
I stopped managing the AI’s memory and started building systems again.
What’s Next
Over the next six weeks, I’ll break down:
- How SR-SI works
- How it differs from RAG
- Why it prevents architectural drift
- How you can implement it today
The goal isn’t gatekeeping. It’s replication. This methodology should be validated, challenged, refined.
We’ve been thinking about AI memory wrong.
The answer isn’t more tokens.
It’s better architecture.
Next week: The mechanism behind SR-SI and the data that convinced me it fundamentally changes how we build with AI.
Find the whitepaper on the AI Memory Prosthesis Here
If you’re interested in implementing SR-SI or contributing to its validation, I’d love to hear from you.