Beyond Retrieval

Toward knowledge systems that can decay, doubt, and evolve.

February 7, 2026

Your knowledge system treats a note from three years ago the same as something you confirmed yesterday. Same confidence. Same weight. Same retrieval priority. It doesn't care when you last verified it. It just sits there, equally sure about everything, forever.

Retrieval is good. Search, ranking, entity resolution, RAG, compaction, all of that is useful. But staleness is still under-modeled. Most systems can tell you what looks relevant. They are much worse at expressing something like: "I used to believe this strongly, but I'm less sure now."

So we compensate. We gather dates and surrounding context, hand them to a model, and ask it to infer whether something is stale, contradicted, or newly important. Sometimes that's good enough. But it's still a workaround. You're constrained by context windows, hoping you fetched the right material, and asking the model to do reasoning the system should often do itself. I'd rather stale beliefs, failed expectations, and tension between claims be first-class system mechanics than prompt-time guesses.

When I say memory, I mean stored traces: notes, messages, events, observations, documents. When I say knowledge, I mean the system's current working model of the world. Beliefs are the explicit claims inside that model, each with some degree of confidence and some basis in evidence. Expectations are what those beliefs imply should happen next. Contradictions are the places where the model no longer fits together cleanly.

If we call all of this "memory," we end up blurring storage, inference, confidence, and prediction into one bucket. A lot of the confusion around these systems starts there.

Recent memory systems are getting better at layered memory, compaction, retrieval, and temporality. That's useful, and some of it gets close. But the center of gravity is still usually recall, personalization, or context management. What's still missing is a runtime system that keeps these ideas explicit enough to query, update, and compose. I'm after something slightly different: a system that can tell when one of its own beliefs is getting weak, stale, or directly challenged.

Confidence should weaken over time without reinforcement. A belief can still be likely while becoming less trustworthy. Probability and confidence are not the same thing. Something can remain 90% likely while having low precision because it has gone unverified for too long. A lot of systems flatten those into one score and lose the distinction between "I'm confident this is true" and "this used to seem true and I haven't checked recently."

A knowledge system doesn't just store facts. It also holds expectations about what should happen next: a client should respond within a certain window, a commitment should get followed up on, a flow meter should stay in range, a weekly pattern should continue until something changes. When the expected thing does not happen, that's a different signal from passive staleness. Now the system has prediction failure, not just age.

And these mechanics don't care much where signals come from. A client changing communication patterns, a calendar commitment slipping, a project going quiet, or a metric drifting out of its usual range all generate beliefs and expectations in roughly the same way.

Then there's the harder part: reasoning about absence. You notice when someone stops texting. You notice when a number flatlines. You notice when the thing that usually happens doesn't. Most systems don't represent that very well because the space of things that did not happen is effectively infinite. So the system needs scope. It needs a hot set: live expectations derived from current beliefs, active commitments, and patterns that matter right now. As beliefs decay and commitments resolve, expectations expire naturally. The system only watches what it currently has reason to watch. That also makes it searchable. Instead of brute-forcing the whole belief set, it can focus on the parts that are active, unstable, or no longer fitting together cleanly.

Once you have that, the violations themselves carry information. Something overdue is different from something contradicted, which is different again from something simply going missing. A task that didn't get done by Friday suggests one kind of response. A belief about a client that is directly contradicted by new evidence suggests another. A long-running pattern that just stopped suggests another.

If the system believes a client prefers async communication, that belief might sit for a year and slowly lose precision. Maybe it moves into the hot set because it is worth re-verifying. Meanwhile the client starts scheduling video calls every week. That is no longer passive decay. It is a contradiction. The system shouldn't merely drift toward uncertainty. It should be able to say: this belief is not just old, it is being actively challenged.

Another failure mode is structural tension. Add enough beliefs and they start pulling against each other. The system thinks you prefer focused solo work, but also that you love pairing on hard problems. Both are grounded in recent evidence. Both may be partly true. But together they suggest a missing variable, a hidden condition, or a belief that is too coarse to survive contact with new evidence.

You can represent beliefs and evidence in a graph, and graphs are useful. You can also push preferences and patterns into model weights. Both approaches help. But neither is enough for the behavior I want. They don't automatically model staleness, maintain active expectations, or tell you which parts of the system are over-constrained and starting to fight. Once those things disappear into structure or weights, they also get harder to inspect, update explicitly, and reason about at the level of specific beliefs, expectations, and contradictions.

What I want is a knowledge layer that stays legible: this used to seem true, this prediction failed, these two beliefs are in tension, this is worth re-checking now. Not just because that's easier to debug, but because it makes the system more queryable and composable. You can ask why something changed. You can trace what evidence supported it. You can decide whether to revise a belief, split it into something more specific, or drop it entirely.

What makes sense to me, then, is to pull these primitives together into small domain-level systems. Draw a boundary around a set of beliefs, evidence, expectations, and contradictions, and now you have a working substrate for some part of the world: a person, a machine, a team, or a process.

Then those substrates can compose. A small system can stand on its own or become part of a larger one. A system that tracks one project can feed into a broader view of team health and planning. A system that models one person's patterns can become one component inside a larger personal or organizational knowledge system. It's the same primitives holding together at different scales.

Most systems stop at retrieval. They help you find relevant things. What I want is a system that can also tell you when one of its beliefs is getting stale, when one of its expectations failed, and when two parts of its worldview no longer fit together cleanly.