Summary
When autonomous agents negotiate on behalf of human principals, adverse outcomes demand an answer: which facts, disclosures, or contract clauses were decision-critical, and what changes would have altered the result?
We formalise this as counterfactual audit analysis. Given an observed decision, we model the coordination pipeline as a structural causal model and find the smallest intervention that would have changed the outcome. These minimal interventions decompose naturally into three interpretable classes:
- Evidence counterfactuals — changes to the submitted evidence that would alter clause satisfaction
- Clause counterfactuals — direct changes to policy clause outcomes
- Protocol counterfactuals — a different protocol selection that would change the decision under the same clauses
This decomposition tells an auditor at which level a decision was fragile — whether a small change to the evidence, the policy clauses, or the protocol selection would have flipped the result. It also enables layered verification: each class can be checked independently without recomputing the full pipeline.
We extend the framework to adversarial settings where the counterparty may strategically withhold or misrepresent evidence, distinguishing “the counterparty was truly risky” from “the counterparty strategically withheld information.”
An illustrative experiment applies the framework to a lending simulator, tracing decision boundaries and demonstrating non-linear, asymmetric counterfactual structure in a realistic multi-agent setting.
Authors: Martin Lotz (University of Warwick), Pietro Aluffi (University of Warwick), Marya Bazzi (Sea.dev), Matt Arderne (Sea.dev), Vladimirs Murevics (Sea.dev)
Working Draft — March 2026