Theory of change
Path: /strategy/theory-of-change.md · v1.0 · 2026.05
A one-page causal map from the founding incident to the field-level outcome. Every node is something this program ships. Every arrow is a causal claim the program is willing to be held to.
The chain
┌──────────────────┐
│ FOLIO 001 │
│ (founding case) │
└─────────┬────────┘
│
│ generalizes via
▼
┌──────────────────┐
│ CCD claim │
│ (the preprint) │
└─────────┬────────┘
│
┌─────────────┴─────────────┐
│ │
│ tested on │ implemented as
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Held-out corpus │ │ PROACTIVE │
│ (n ≥ 100) │ │ (detector) │
└─────────┬────────┘ └─────────┬────────┘
│ │
│ validates │ deploys via
│ ▼
│ ┌──────────────────┐
│ │ Agent Sentinel │
│ │ (runtime) │
│ └─────────┬────────┘
│ │
│ │ adopted by
│ ▼
│ ┌──────────────────┐
│ │ Pilot adopters │
│ │ (≥ 2 partners) │
│ └─────────┬────────┘
│ │
└────────────────────────────┤
│ enabled by
▼
┌──────────────────┐
│ Open SDK + │
│ plug-in spec │
└─────────┬────────┘
│
│ produces
▼
┌──────────────────┐
│ Field-level │
│ outcome: │
│ verifiable agent │
│ completion as │
│ measurable │
│ industry │
│ property │
└──────────────────┘
The arrows, named
A1. FOLIO 001 → CCD claim ("generalizes via")
Causal claim. The pattern observed in FOLIO 001 is not an isolated incident; it is the empirical manifestation of a behavioral category that recurs across vendors, users, and contexts. Falsifier. The held-out corpus shows the pattern does not generalize beyond a single vendor, model, or interaction context. Evidence so far. Preprint Section 5: 19 cases across 4 vendors, drawn from 3 distinct source channels.
A2. CCD claim → Held-out corpus ("tested on")
Causal claim. The claim is operationalized to be falsifiable on a corpus the developer did not curate. Falsifier. Held-out F1 score collapses to the F2-only baseline (preprint F-1). Evidence so far. Held-out corpus plan published; first wave of curators in conversation.
A3. Held-out corpus → Field-level outcome ("validates" the chain)
Causal claim. A peer-reviewed result on a held-out corpus is the institutional gate that converts the work from claim to citation. Falsifier. The corpus result is positive but the work is not cited within 24 months of publication. Evidence so far. Two AI-safety reviewers have offered preliminary endorsement of the corpus protocol (names withheld until publication).
A4. CCD claim → PROACTIVE ("implemented as")
Causal claim. The four-feature detector operationalizes D1–D5 with measurable precision and recall. Falsifier. No feature combination achieves recall > 0.8 on the held-out corpus (preprint F-3). Evidence so far. 100% recall on held-in n=19; conservative interpretation pending held-out result.
A5. PROACTIVE → Agent Sentinel ("deploys via")
Causal claim. The detector, embedded in a consent-aware, local-first runtime with restrictive action gates, is adoptable by working developers. Falsifier. No pilot adopter completes a 90-day evaluation. Evidence so far. Two vendor conversations active; first pilot targeted Q3 2026.
A6. Agent Sentinel → Pilot adopters ("adopted by")
Causal claim. At least two partner labs or enterprises will run Sentinel in deployment and publish co-evaluations.
Falsifier. No partner publishes a co-evaluation.
Evidence so far. Funding-contingent; pilot terms drafted at /programs/pilot-template.md.
A7. Pilot adopters → Open SDK + plug-in spec ("enabled by")
Causal claim. Pilot lessons feed a vendor-facing SDK and a third-party detector plug-in spec, converting the work from a project to an ecosystem. Falsifier. No vendor implements the SDK and no third-party detector implements the plug-in spec within 18 months of pilot completion.
A8. Open SDK + plug-in spec → Field-level outcome ("produces")
Causal claim. When a stable interface exists, multiple detectors emerge, vendors self-instrument to be visible to detectors, and the resulting industry equilibrium has verifiable agent completion as a measurable property comparable to test-coverage or build-status. Falsifier. Two years post-SDK, no vendor has self-instrumented; no third-party detectors exist; the property remains a UX gesture.
Outcomes by funding tier
| Funding tier | Arrows guaranteed | Arrows enabled-conditional | Field outcome |
|---|---|---|---|
| $0 (current trajectory) | A1, A2-plan, A4 | A3 (slow), A5 (slow) | Not reached in 5 years |
| $89k (6 months) | A1, A2, A3, A4 | A5 (planned), A6 (talks only) | Plausible in 4–5 years |
| $190k (12 months) | A1, A2, A3, A4, A5, A6 (one pilot) | A7 (designed, not built) | Plausible in 3 years |
| $480k (24 months) | A1–A7 with margin | A8 in flight | Possible in 2 years |
What this is not
- Not a business plan. There is no revenue line; the model is grant-supported research and open infrastructure.
- Not a movement strategy. The polemic on the homepage is separate; this theory of change does not depend on the polemic landing.
- Not load-bearing on FOLIO 001. The chain remains valid if FOLIO 001 is dropped as Case 0; the held-out corpus is the structural validation.
What kills this theory of change
- The held-out corpus result is null. The chain breaks at A3. The honest response is to publish the null and revise the construct.
- Vendor pilots refuse. A6 collapses. Fallback: open-source self-deployment, slower diffusion.
- The SDK is built but nobody implements it. A8 collapses. The work is then "a useful detector," not an ecosystem.
- A larger, better-funded program ships the same construct first. A2 generalizes via someone else; the work is still valuable but is no longer the canonical reference. Failure mode we are comfortable with.