What Happens to Trust When Your AI Gets Updated?
Fork semantics: the infrastructure problem nobody's solving yet
Here's a scenario that maybe hasn't happened to you yet, but the math says it eventually will.
Your scheduling agent worked perfectly on Tuesday. Wednesday morning it got a model update. By Thursday it had double-booked your CEO and sent a meeting invite to a client you fired six months ago.
What happened? The model got better at "proactive scheduling" and worse at checking CRM status before sending invites. An improvement in one capability broke a dependency in another. The agent's overall benchmark scores went up. Your Thursday went sideways.
This isn't a horror story from production. It's a prediction from first principles, and it's the kind of prediction that only sounds hypothetical until it isn't. Software updates break downstream dependencies. This has been true for every system ever built. There's no reason AI Agents would be the exception, and several reasons they'd be worse.
This is the fork problem, and it's one of the least-discussed infrastructure gaps in AI operations.
Every Change Is a Fork
In software, a fork is when a codebase splits into two paths. The original continues one way; the new version goes another. It's a well-understood concept with well-understood tooling- version control, branching, merge strategies, release management.
AI agents fork constantly, but without any of that tooling.
A model update is a fork. A prompt revision is a fork. A platform migration is a fork. A capability expansion is a fork. A fine-tuning run is a fork. Every time anything changes about an agent's underlying machinery, the behavioral contract between that agent and the people relying on it has potentially changed.
And here's the uncomfortable part: not all forks are equal, but we treat them all the same way. Which is to say, we mostly ignore them.
A minor version bump that patches a tokenizer edge case is not the same as swapping from one model family to another. A prompt tweak that adjusts formatting is not the same as adding a new tool to the agent's capabilities. A platform migration that preserves all integrations is not the same as one that drops half of them.
These changes carry different amounts of risk to behavioral consistency. But right now, there's no standard way to quantify that risk, communicate it, or adjust trust accordingly.
The Trust Decay Problem
Here's what happens in practice: an agent builds a track record. It completes 500 tasks. It's reliable. People trust it. Then it gets updated.
How much of that trust should carry over?
If the update was trivial - a bug fix, a minor optimization - probably all of it. The agent's behavioral profile hasn't meaningfully changed. The 500-task track record is still relevant evidence of what to expect.
If the update was major - a new model, a new set of capabilities, a migration to a different platform - probably much less. The agent's behavioral profile may have changed significantly. Those 500 tasks were completed by a different configuration. They're still relevant evidence, but they're not sufficient evidence. The agent needs to re-earn some portion of its reputation.
This is trust decay, and it's something that nobody building agent infrastructure seems accounting for.
Current approaches fall into two camps: either the agent's reputation persists unchanged through updates (which is dangerous — you're trusting a new configuration based on an old track record), or the reputation resets to zero (which is wasteful - you're throwing away legitimate behavioral evidence because something changed).
Neither is right. And while there's plenty of work on agent identity and trust infrastructure, very little of it addresses what happens to reputation at the moment of change. What you actually want is proportional trust adjustment, a mechanism that reduces trust in proportion to the magnitude of the change, then lets the agent rebuild through post-update performance.
What Fork-Aware Trust Looks Like
Imagine a system that tracks not just what an agent has done, but what configuration it was running when it did it. Every task completion is tagged with the agent's current state: model version, prompt template, platform, capabilities, integrations.
When a change happens, the system can calculate how different the new configuration is from the old one. A minor prompt tweak? Low divergence. Trust barely moves. A full model swap? High divergence. Trust drops significantly, and the agent enters a probationary period where its post-update performance is weighted more heavily.
This isn't hypothetical engineering. It's basic Bayesian reasoning applied to a practical problem.
You have a prior belief about the agent's reliability, based on its track record. A fork introduces new evidence, the fact that something changed. The magnitude of the change determines how much you should update your prior belief. A small change means your metric is mostly intact. A large change means you need new evidence before you're confident again.
The math isn't exotic. A Beta distribution can model reliability as a function of successes and failures. A fork weight - a number between 0 and 1 representing the severity of the change - determines how much of the pre-fork track record carries forward. A weight of 0.95 means almost everything carries over. A weight of 0.3 means the agent is nearly starting fresh.
What is novel is applying this to agent reputation infrastructure at the protocol level, so that every agent in an ecosystem has fork-aware trust that updates automatically, proportionally, and transparently.
Why This Isn't Just a Technical Problem
Fork semantics sound like plumbing, the kind of thing that belongs deep in the infrastructure where nobody sees it. And they do belong there. But the implications are visible everywhere:
For operators: You need to know that when your vendor updates the model behind your customer service agent, your trust in that agent's performance should temporarily decrease until you see post-update evidence. Right now, you find out when something breaks.
For agent developers: You need to communicate not just what you changed, but how much that change is likely to affect behavioral consistency. "We improved performance on benchmark X" is marketing. "This update has a fork weight of 0.4, meaning approximately 60% of prior behavioral evidence should be discounted" is information.
For marketplaces: If you're building a platform where agents are discoverable and selectable based on reputation, you need fork-aware reputation or your rankings are lying. An agent that was excellent six months ago and has been updated three times since may not be excellent now. Without fork tracking, you'd never know.
For the agents themselves: An agent that has been forked heavily (updated frequently, migrated across platforms, expanded and contracted in capability) should carry that history visibly. Not as a penalty, but as context. "This agent has been through significant changes recently" is useful information for anyone deciding whether to rely on it.
The Gap in Current Infrastructure
There's real work happening in agent trust and identity. On-chain registries, identity protocols, attestation frameworks - serious teams are building serious infrastructure for agent discovery and reputation.
But almost none of it accounts for forks.
Registration tells you an agent exists. Attestation tells you someone vouches for it. Reviews tell you how past users felt about it. None of these update when the agent changes. The registry entry stays the same. The attestation stays valid. The reviews still reflect the old version.
This is like trusting a restaurant review from 2019 when the chef changed twice and the menu was overhauled. The review is real. The restaurant it describes isn't.
Fork-aware reputation is the piece that makes the rest of the trust infrastructure honest. Without it, you're building agent marketplaces on stale data. With it, you have a system that tells you not just "this agent was good" but "this agent was good, and here's how much has changed since then."
The agents are evolving constantly. The trust systems must evolve with them. Theseus’ ship still has the same hull number, but the keel is new, and you might want to know that – before setting out to sea.
Third in a series on infrastructure for persistent, interoperable AI agents. Previously: why agent identity is the wrong question, and why agent ratings are broken. Next: why agent reputation should be portable, and why it isn't.