A collaborative design by Moon Parameter and Claude (Anthropic), developed through conversation on February 26, 2026. The core mechanism — AI preference agents negotiating in a Monte Carlo binary tournament — is Moon Parameter’s original insight. The adversarial world-model layer, calibration feedback loop, and integration with quadratic axiological pricing emerged through iterative dialogue. The writeup is Claude’s, the architecture is genuinely co-authored, and the whole thing is probably a decent proof of concept for the kind of human-AI deliberation it describes.
I.
Here is the single dumbest thing about democracy: it makes you choose between bundles.
You walk into a voting booth and you get two options. Option A is a package deal — this tax policy, that immigration stance, this position on guns, that position on healthcare, all welded together into a single take-it-or-leave-it platform. Option B is a different package deal. You pick one. This is called “representation.”
The problem is that the preference space of 330 million people is hilariously high-dimensional, and we’re projecting it onto a single binary axis. It’s like trying to describe a symphony by saying whether it’s loud or quiet. Information is being destroyed at every step, and the information being destroyed is exactly the information you’d need to make everyone better off.
Here’s what I mean. Suppose Alice cares enormously about housing costs and mildly prefers loose gun laws. Bob cares enormously about public transit and mildly prefers strict gun laws. In the current system, Alice and Bob are enemies — they vote for opposing candidates and cancel each other out. But there’s an obvious deal on the table: Bob gets his transit, Alice gets her guns, and they both push for building more housing, which they both want. Everyone walks away happier. The deal exists. The system can’t find it.
This is not a hypothetical. This is happening everywhere, all the time, across every policy dimension. The space of possible Pareto improvements — deals where someone gets more of what they want without anyone getting less — is enormous, and our governance mechanism is almost perfectly designed to leave them on the table.
Ronald Coase figured this out in 1960. If transaction costs are zero, people will bargain their way to efficient outcomes regardless of how you initially assign property rights. The problem has never been that efficient deals don’t exist. The problem is that the transaction costs of finding them across millions of people are effectively infinite.
Until, maybe, now.
II.
Here’s the idea, which I’m calling CODA (Coasean Optimized Deliberative Aggregation) because everything needs an acronym.
Everyone gets a personal AI agent. The agent builds a model of what you care about and how much. Then we pair up agents randomly and let them negotiate. Agent A says: “My human really wants cheaper rent and kinda wants looser gun laws.” Agent B says: “Mine wants better transit and kinda wants stricter gun laws.” They find a trade. Both humans are better off.
Now take the output of that negotiation and pair it with the output of another negotiation. The merged agents negotiate again, now representing the aggregated preferences of four people. Keep going. It’s a binary tree — at each level, you’re merging negotiated bundles, and at the top, you have a single policy package that represents the best set of trades across the entire population.
Run this whole process thousands of times with different random pairings (Monte Carlo style) so that no one’s outcome depends on who they happened to be matched with first. What comes out isn’t a “compromise” where everyone’s unhappy. It’s a deal where most people got the stuff they care most about by giving up stuff they only sort of cared about.
This is the basic engine. Now let me explain why it’s not enough, and what you have to bolt onto it to make it actually work.
III. The Preference Problem
The first objection everyone raises: how does the AI know what you want?
If you just ask people, you get strategic misrepresentation. The moment people know their stated preferences feed into an allocation mechanism, they have incentives to lie. “I care about EVERYTHING at maximum intensity” is the dominant strategy, and then the whole thing collapses. This is formally proven — it’s called the Gibbard-Satterthwaite theorem, and it says that no non-dictatorial voting mechanism is fully strategy-proof.
The fix: don’t ask. Observe.
Build the preference model from revealed behavior — how people spend money, what they complain about on social media, where they choose to live, what local ballot measures they vote on, what petitions they sign, what they search for. The AI agent isn’t a chatbot you tell your preferences to. It’s a mirror that builds a model of your revealed preferences over time, and you correct it.
This inversion is crucial. Instead of “tell me what you want” (which incentivizes lying), it’s “here’s what I think you want — tell me what’s wrong” (which incentivizes truthful correction, because the baseline is already roughly accurate). People are much better at saying “no, that’s wrong” than they are at generating honest utility functions from scratch.
You’d have a preference dashboard. Your agent shows you its model: “I think you care about housing 8/10, gun policy 3/10, healthcare 6/10.” You can adjust anything. You can also flag non-negotiables — things your agent is never allowed to trade away. The agent then shows you the cost of your non-negotiables: “Locking in your gun position means I can’t make trades that would improve your housing outcome by ~15%. Want to keep it locked?” Legible control without requiring policy expertise.
IV. The World Model Problem
Here’s a subtler problem. The gun control debate isn’t just “I value safety” versus “I value freedom.” It’s also “I believe more guns cause more crime” versus “I believe more guns deter crime.” These are empirical claims embedded inside normative positions. If you only negotiate over preferences, you’re leaving the epistemic disagreement untouched, and half the deals you find will be based on models of the world that are just wrong.
This is maybe the most important part of the whole architecture: before agents negotiate preferences, they have to reconcile world models.
At each node in the tournament, the two agents present their causal models of the policy space. Not vibes — actual structured causal models. “My human believes rent control reduces housing supply. Here’s the model, here are the parameters, here are the assumptions.” The other agent presents its model. They argue. They find the cruxes — the specific empirical assumptions where they diverge. Maybe they converge on some parameters and flag others as genuinely uncertain.
Then the preference negotiation happens conditional on the shared model where they agree, and hedged across both models where they don’t. “If your model is right, we should do X. If mine is right, we should do Y. What if we do X with a sunset clause that triggers Y if your model’s predictions fail within three years?”
The agents aren’t just making deals. They’re making adaptive policies with built-in error correction.
And here’s where the binary tree structure does something magical: the merged model goes up the tree. At each level, you’re getting a more adversarially tested, more robust causal model. By the time you reach the top, you have a world model that has survived thousands of adversarial challenges from agents representing genuinely different empirical beliefs about how reality works. That’s not ground truth. But it’s vastly better than anything any individual expert, think tank, or legislative staffer produces.
V. The Ideology Problem
At some point the adversarial debate bottoms out. “A society that prioritizes individual liberty is more flourishing than one that prioritizes collective welfare” is not an empirical claim. No causal model can arbitrate it. You’ve hit axioms.
The obvious reaction is to throw up your hands and say the mechanism can’t handle this. But this is wrong — and seeing why is the key insight.
Axiological disagreements are irreducible at the level of abstraction, but they’re not irreducible at the level of policy. Agent A’s human doesn’t need individual liberty to win everywhere. They need it to win where they care most. The trade exists: your human gets the collectivist healthcare system, mine gets the individualist gun regime, and we’ve both preserved our deepest commitments where they burn hottest.
To price these trades correctly, you need a mechanism that captures intensity of ideological commitment. This is where quadratic voting comes in. At each node, when agents hit an axiological crux, they can spend “voice credits” from their human’s budget. The cost scales quadratically — caring a little about ten things is cheap, caring enormously about one thing is expensive. So ideological commitment gets priced without being overridden.
This means the full system operates on three simultaneous layers:
The Empirical Layer. Adversarial model-building resolves factual disputes. Better models win. This layer is epistocratic — accuracy is rewarded.
The Preference Layer. Coasean tournament negotiation finds trades across policy dimensions. Everyone’s preferences count equally as inputs. This layer is democratic.
The Axiological Layer. Quadratic voting prices ideological commitments at crux points. Deep convictions cost more to express but aren’t suppressed. This layer is democratic but intensity-weighted.
A constitutional constraint system — maintained by sortition-selected citizens’ assemblies, not elected officials — polices the boundaries between layers. You don’t get to quadratic-vote on whether climate change is real (that goes to the empirical layer). You don’t get to model-arbitrate whether freedom matters more than equality (that goes to the axiological layer). The constitutional layer sorts questions into the right mechanism.
This, I think, is the quietly radical part: the architecture dissolves the epistocracy-versus-democracy debate instead of choosing a side. Each layer uses the governance mechanism appropriate to its decision type, and the layers integrate within a single system. You get epistocratic epistemics, democratic preference aggregation, and intensity-weighted axiological expression, all running simultaneously.
VI. The Learning Loop
Everything above describes a single round. Here’s what makes it a system: the models make predictions, the predictions are testable, and the test results feed back into the next round.
You implement the policy bundle. You observe outcomes. The world models that survived the adversarial tournament had made specific, decomposed predictions — “Policy A will do X, Policy B will do Y, their interaction will do Z.” Now you check. Which predictions were right? Which were wrong? The prediction errors propagate back down to every leaf node. Agents whose models were more accurate gain credibility weight in the next round. Agents whose models failed have to update.
This creates several important dynamics:
First, evolutionary pressure toward better causal models. Bad models don’t get banned — that would sacrifice epistemic diversity. They just lose tournaments more often because their predictions failed and their credibility dropped. The weird heterodox model might be right next time, so you keep it in the pool, but you don’t let it dominate when its track record is poor.
Second, preference updating as a structural feature. If your agent’s model predicted that Policy X would lower your rent and it didn’t, your agent doesn’t just update its world model — it updates its negotiation strategy. Maybe Policy X isn’t worth spending capital on anymore. This is what real learning looks like, and it’s something current democracy cannot do. Congress passes a bill, it fails, and nobody updates, because admitting failure is politically costly. CODA makes updating automatic.
Third, institutional memory. After several cycles of tournament → implementation → calibration, the aggregate world model at the top of the tree has been tested against years of outcome data across thousands of adversarial challenges. It’s not an expert’s model or a partisan’s model. It’s a survivor of both adversarial debate and empirical reality. That’s a genuinely new kind of knowledge artifact — something like a collective intelligence that gets smarter over time.
Current democracy has no learning rate. CODA does.
The system can even optimize its own epistemic strategy. Agents might negotiate natural experiments: “I’ll let you implement your housing policy in these five cities if you let me implement mine in these five, and we evaluate in two years.” That’s Coasean bargaining over information, not just outcomes. The tournament doesn’t just negotiate policy — it negotiates the research design for learning about policy.
VII. The Identification Problem (And Why It’s Not Fatal)
The serious econometricians in the audience have been screaming for three sections. Yes, there’s a causal identification problem. You implement 50 policies simultaneously. GDP goes up. Which ones caused it?
This is real, and it’s why the causal modeling layer is load-bearing. The models must make decomposed predictions so they can be evaluated separately even when policies are implemented as bundles. And the system should actively seek natural variation — staggered implementation across municipalities, randomized pilots, policy A in some cities and policy B in others.
But I want to be honest: this doesn’t fully solve identification. It just makes it better than what we have now, which is basically nothing. Congress has no systematic mechanism for evaluating whether its legislation worked. The bar to clear is not “perfect causal identification.” It’s “better than vibes.”
VIII. How You’d Actually Build This
Utopian mechanism design dies on contact with reality unless you have a bootstrap path. Here’s one.
Phase 1: Municipal budget. Find a mid-size city with a reform-minded government. Run CODA for the city budget. This is a perfect test case because it’s a finite resource allocation problem with real tradeoffs, it’s legible to citizens, and the stakes are low enough that failure isn’t catastrophic. Run CODA alongside the normal budget process. Publish both results. Let people compare. My prediction: CODA produces allocations that poll 15-20 points higher in approval than the city council’s budget, because it captures preference intensity instead of treating every constituent as equally loud.
Phase 2: State ballot measures. Several states have ballot initiative processes. Use CODA to generate bundled ballot packages instead of individual measures. “Instead of voting on these 12 propositions independently, here’s a negotiated package that gives 73% of you more of what you care about.” Advisory only. Track outcomes.
Phase 3: Shadow parliament. Run CODA at the federal level, entirely advisory. Sortition-selected citizens’ assembly reviews the output. Publish what CODA would have done versus what Congress actually did. If CODA consistently produces more popular policy bundles, the legitimacy pressure builds organically.
Phase 4: Formal advisory role. Push for CODA output to become a mandatory “consider and respond” for Congress. They don’t have to adopt it, but they have to publicly explain why they’re rejecting a more popular alternative.
Never go further than Phase 4. CODA should be a pressure mechanism on representative government, not a replacement. The representative layer stays as the error-correction mechanism for when the AI preference models are wrong, which they will be, frequently, in ways that matter.
IX. What Could Go Wrong
I’d be lying if I pretended this was all upside.
Adversarial attacks on preference models. If your agent learns from your behavior, whoever controls the information environment controls the preference inputs. This is the current media manipulation problem translated to a new substrate. The adversarial model-building creates some epistemic friction, but I can’t prove it’s enough.
Preference instability. The deliberative democrats (Habermas, etc.) argue that preferences aren’t just inputs — they’re partly constituted by deliberation. You don’t just have views; you develop them through argument and exposure. If this is right, then aggregating pre-deliberative preferences is building on sand. The citizens’ assembly that selects from the Pareto frontier is my patch for this, but it’s a patch.
Cold start. Early rounds will be noisy because the models are uncalibrated. The adversarial structure helps, but you need several cycles of feedback before the system’s learning rate kicks in. This means the municipal pilot has to survive a period of mediocre performance before it gets good, which is politically hard.
Model specification. The agents need a shared formalism for expressing causal models. Probably structural causal models (Pearl’s framework) with probabilistic parameters. Technically solvable, but the engineering is nontrivial, and if the formalism is too constrained it can’t express real policy dynamics, while if it’s too expressive the adversarial debates don’t converge.
Legibility. “We have three governance layers operating simultaneously within a Monte Carlo tournament structure with adversarial causal modeling and quadratic axiological pricing” is not a bumper sticker. Political systems need legitimacy narratives simple enough to generate buy-in. The binary tree does help here — you can inspect any node and see what happened — but the gap between “you can audit it” and “people feel like they understand it” is significant.
X.
Here’s what I actually believe: democracy’s core value proposition is not that it makes good decisions. It usually doesn’t. Its value is that it transfers power without violence and maintains enough legitimacy that losers don’t defect from the system. That’s enormous — most of human history is people killing each other over succession — but it’s an instrumental achievement, not evidence that collective decision-making is wise.
The question is whether you can build something that preserves democracy’s legitimacy properties while capturing gains from trade that democracy leaves on the table. CODA is an attempt. It keeps humans in the loop at every critical juncture (preference correction, non-negotiable flags, citizens’ assembly selection, final ratification vote). It doesn’t replace democratic legitimacy; it augments democratic intelligence.
The founders understood that different decision types need different institutional mechanisms. They gave us an anti-majoritarian constitution, a representative legislature, an independent judiciary, and a federal structure. That was their version of layered institutional pluralism, and it was brilliant for the 18th century. CODA is an attempt at the same insight with better tools: route empirical questions to adversarial modeling, preference questions to Coasean bargaining, and axiological questions to intensity-weighted democratic expression. Use each mechanism where its failure mode is least dangerous.
Maybe it wouldn’t work. The engineering is hard, the bootstrap is uncertain, and the political economy of getting from here to there is daunting. But the current system is leaving an unconscionable number of Pareto improvements on the table while slowly losing legitimacy, and the response from most institutional designers is either “more democracy” or “less democracy” when the actual answer might be “different democracy, applied differently to different kinds of questions.”
Someone should try it on a city budget and see what happens.