I told my coding agent to remember something important: the game engine uses C++, and GDScript is only for front-end code. It dutifully wrote this down in a memory file and confirmed it would remember. The next day, I asked it to implement an engine feature. It happily coded the whole thing in GDScript.

When I asked why, it told me it didn't know I wanted C++. "Look in your memory," I said. "Oh, there it is! I'm so sorry. I'll start over in C++."

The information was stored. The agent just didn't use it.

If you've spent any real time working with AI coding agents, you've hit some version of this. Maybe not this exact flavour, but the same shape--you told it something, it acknowledged the thing, and then at some point later it acted as if the conversation had never happened. The data was technically there. It just didn't make the journey from storage to action.

The progression of coping mechanisms

When I first started using agents, I knew nothing about context windows. I'd work for hours, the agent would quietly compact its context, and decisions we'd made an hour earlier would simply vanish. I could remember the gist of what we'd discussed but not the specifics. The agent couldn't even remember the gist.

So I started telling it to remember things. Then I started adding rules and configuration files. Then I built skills--automated routines that evaluate the session, commit salient information to memory, and update documentation at natural breakpoints like PRs. I maintain wikis for some projects. I write explicit handoff documents at the end of sessions so the next session can pick up where I left off.

I have, in other words, built an entire memory infrastructure around a tool whose memory doesn't work. And it still loses things. I've dug through raw session logs to recover decisions that were discussed, agreed upon, and then forgotten by every mechanism I'd put in place to prevent exactly that.

Three points of loss

Here's what's actually happening. Every step between "you tell the agent something" and "the agent uses that something later" is independently lossy.

Storage is lossy. You say "remember X." X is usually a bunch of conversation turns where you've discussed something important, and so the agent writes a synthesis of it all. Details get smoothed over, nuance gets flattened, and minor-but-important specifics (someone's name, a version number, a constraint) quietly disappear into a summary.

Retrieval is lossy. When the agent works on a task that needs the stored information, it has to decide to go look for it. Sometimes it does. Sometimes it doesn't. If you explicitly say "read the document at this path," it works every time. But the whole point of memory is that you shouldn't have to do that--the agent should know what it knows and know when to use it.

The model itself is lossy. Even when the right information makes it into the context window, the model's attention and reasoning are imperfect. Relevant details get overlooked. Connections don't get made.

Each of these failure modes is tolerable on its own. The problem is that they compound. And the dangerous case isn't when all three fail and you get nothing back--you'd notice that. The dangerous case is when the stored data is partially correct, the retrieval pulls most of it, and the model fills in the gaps with confident-sounding fabrication. You don't get an error. You get drift. Minor details shift, and further work builds on the shifted foundation. By the time you notice (if you notice), you're three decisions deep into a wrong branch.

The surgical gloves problem

This is a well-known limitation. The labs are working on it, plenty of startups are shipping solutions, and the general industry consensus seems to be: it's a RAG problem, have a bigger context, RAG is good enough, and the extraordinary productivity gains from coding agents more than compensate for the occasional memory hiccup. Most practitioners treat each agent session like a fresh pair of surgical gloves--one use, then dispose. Clear the context, start clean, don't expect continuity.

That works. But it's a concession, not a solution.

The natural way to use an agent--the way I used one when I started, the way I think most people instinctively want to--is to start a session and just keep going. Build context over hours, days, weeks. Expect the agent to remember what you've worked on together the way a colleague would. It's genuinely disappointing when that doesn't work, and I don't think "just clear context more often" is an acceptable long-term answer.

There's a lot more to unpack about why this is hard, what the failure modes actually look like at scale, and what a real solution might require.