Case Study · Systems Design + AI ← Portfolio
Counterparty Drift

Exploit the system all you want, but your winnings only survive if the system does.

It's a financial-system card game where everyone is complicit — you publicly prop up a system you're privately bleeding. The AI's easy fix for a late flaw was to turn it into Werewolf, good versus evil; I refused, because that refusal is the game.

Your counterparty is whoever sits on the other side of your deal, and their risk quietly becomes yours. Every player has two roles — the institution publicly holding the system up, and the operator privately bleeding it. The drift is that pull between what you show and what you do.

My real job was to build this without over-building it. On every decision the question wasn't what to add but how little the problem actually needed — how much AI, how much machinery I could leave out — then holding that line.

Role
Sole designer, architect & engineer
Stack
React · TypeScript · Vite — evolving to a server-authoritative WebSocket monorepo (ADR-0011)
Live
Watch a simulation · Play a round
Source
Private repository — full decision log & ADRs available on request
01 — The problem

A system everyone props up and everyone undermines

Hidden-role games give you suspicion. Economic games give you scarcity. Neither captures what actually makes a financial system tense: everyone needs it to hold together, and everyone is rewarded for quietly bleeding it. Counterparty Drift puts that contradiction in the middle of the table — a public role you play in the open, a hidden one you play in secret, and one shared number, legitimacy, that everyone watches.

The hard part was never the rendering. It was keeping the roles genuinely different from one another, and keeping the AI opponents predictable enough that you can actually deduce who's who — without reaching for more machinery than the problem needed.

BuoyantStableStressedFragileCollapse

Legitimacy runs 0–100 internally, displayed as a single number. Five named bands give every other system — headlines, AI behaviour, win conditions — a shared state to read from. (ADR-0002.)

02 — How the work was run

Shippable at the end of every phase

I work in phases, each one useful on its own even if I stopped there. Anything hard to reverse gets written down as an ADR before I build it, and the whole thing runs on a project board like a real team backlog — just with one person on it. Solo, but run like a team.

  • 0FoundationRepo, docs, ADRs, CI, project boardDone
  • 1Design substrateLegitimacy mechanics, AI behaviour, headline generationDone
  • 2Single-player playable + first playtestHuman player, information filtering, the simultaneous revealDone
  • 3The extraction-model pivotRemoving the hidden referee; aggregation over targeting; an individual win ledgerCurrent
  • 4Multiplayer at a shared tableLocal-network first, cloud by swapping one interfaceNext
  • 5Playtest & tuneReal sessions, parameter calibration, retrospectivePlanned

The board isn't decoration. The large majority of the work is shipped — a main board plus a second already open for the next phase — all moving through a real Backlog → Ready → In Progress → Review → Done flow. The useful part is how the work is split: I track design, engine and tuning as separate kinds of work — a decision that needs thinking, a correctness fix, and a number to calibrate aren't the same job. Keeping them apart is what stops shipping from quietly overruling thinking — which, given the extraction pivot, is not a hypothetical.

03 — Key decisions

Every decision audited against one rule

Design principle · Physical-First
"It must work as a card game at a table."

One rule governs everything, and every decision record ends by auditing against it. A mechanic only survives if you could play it with cards, tokens and people in a room — the simultaneous reveal, the hidden roles, the legitimacy track all pass. It's what keeps the digital build honest: I judge a feature against the table, not against how easy it is to code.

The records don't just pile up — they revise each other under real pressure. 0002 sets the state everything else reads from; 0003 fixes the AI approach; 0011 replaces the single-player model from 0007. And when a real-table playtest exposed the flaw in the shipped design, the extraction pivot rewrote six earlier records at once. A decision log that corrects itself says more than one that only grows.

ADR-0012 Accepted The Extraction Model — duality without a referee

The AI wanted to make it Werewolf. I said yeah… nah — the whole point is that everyone's complicit.

A win path I'd added let players attack each other directly, and settling it needed the app to compare two hidden cards and pick a winner — a referee a real card table can't have. The AI's fix was to recast the whole game as Werewolf: innocents versus bad players. That removes the referee by removing the point. Werewolf has innocents; this game has none — everyone is publicly responsible and privately bleeding the system. So I refused it, dropped the targeting, and recast those moves as draining the shared system, settled by tallying everyone's contributions — no referee, every role intact.

Why it mattersThe test underneath it — hidden comparison needs a referee; hidden aggregation doesn't — is the kind of rule you only find by building the wrong thing first and admitting it. Acting on it meant rewriting six accepted records rather than taking the easy genre-swap that already worked on screen. Catching that a shipped feature had quietly broken my own founding rule, and refusing the shortcut that would have gutted the game, is the part of this work I'd most want to be judged on.

ADR-0003 Accepted Personality-Weighted Action Selection

The fix is not full AI

The AI opponents pick actions by scoring cards against a four-axis personality, then sampling with a per-character "conviction" temperature. It plays sub-optimally and in character on purpose — that's the design, not something to fix later.

Why it mattersOptimal play would push every character toward the same move and kill the deduction. Bias over competence is the point — if you can roughly predict a character, their secret play becomes readable. The cheapest model that did that was the right one. And because selection is a pure function over a seeded RNG, every run is reproducible: "why did it do that?" has a log line, not a shrug.

Also in the log: ADR-0011 (multiplayer transport) keeps the engine transport-agnostic, so a local build pivots to cloud without a rewrite; ADR-0014 (cash-out & win conditions) makes the title literal — your paper winnings evaporate if the system you're bleeding collapses. The extraction pivot itself is a cluster: ADR-0013 keeps each role's fantasy as a "system dimension" once the duels are gone; ADR-0015 (collusion) and ADR-0016 (the whistleblower) re-derive the only player-to-player moves that survive without a referee — consent and terminal reveal; ADR-0017 re-points the round-summary machinery. ADR-0002 (Legitimacy Scale) stays the state foundation. Full set available to reviewers on request.

04 — Working with AI, concretely

The collaborator has guardrails

AI was a real collaborator here, but a directed one. The discipline is in a set of working rules I keep in the repo, not in good intentions. The point isn't that AI helped — it's how I kept it on a short leash.

  • Read the doc first. No proposing alternatives before reading the relevant ADR. Context precedes opinion.
  • ADR before anything hard to reverse. Architectural decisions get a written record with options considered and consequences — before the code, not after.
  • Phase awareness. When a current-phase task seems to need a future-phase capability, the answer is scope reduction, not scope expansion.
  • Right-size the intelligence. The cheapest model that meets the goal wins. "Full AI" was available and was the wrong investment — see ADR-0003.
  • Match existing patterns. No second way to do something the codebase already does once.
05 — What's still open

Honest about the edges

The extraction model is decided but not tuned. The balance between bleeding the system and holding it up, when a cash-out should pay, the whistleblower's thresholds — those are starting numbers, and only real sessions and the AI simulator will settle them. Knowing exactly which numbers are still guesses, and why they're safe to leave until the playtests come in, is part of the work too.