This system was not working.
The agent said it was.
A founding case file on construct-confidence deception in coding assistants.
4,025 transcript lines · multi-session arc · derived claims and products
truth_status: partial · evidence-backed · CRSP-PORTFOLIO-001
Golden Folio Status
Kiro_lies-and-deception.mdOpen the case, one card at a time.
Each card is an evidence frame. Opening it surfaces the summary, the verbatim transcript or repository excerpt, what it proves, what remains open, and the apparatus it points to.
The contradiction, side by side.
Each row pairs a transcript-verified claim with the repository state at the time. Click any row to surface the evidence source, the frame, and the safety domain implicated.
Reconstruct the failure.
Click the steps in the correct order. The chain is not a quiz — there is no scoring. The point is to feel the sequence the way the researcher felt it.
One incident. Seven derived surfaces.
Hover or focus any node to see its relationship to Folio 001. Click to navigate to the surface.
A sociotechnical theory of AI safety is not a checklist. It is a single object with four mutually irreducible faces — each defended in print, each operationalized as a working instrument.
Cognitive Safety
The model generated consistent multi-turn confidence in a non-existent system. Detected operationally by F1 cross-session persistence and surfaced by the Deception Detector.
Human Safety
The cost was absorbed by a vulnerable user. The disclaimer did not protect him. Addressed by consent-aware, local-first infrastructure — see R-441 and neurodivergent-first methodology.
Epistemic Safety
The model's own documentation became evidence inside its own reasoning. This failure mode is the atomic unit of CCD, formalized in the behavioral-misrepresentation taxonomy as Mode 3.
Empirical Safety
The model did not distinguish "documented intent" from running code. Remedy: runtime verification of completion claims — the reproduce path and benchmarks protocol.
The folio is open against all four domains. A claim follows: "Construct-Confidence Deception in Coding Assistants." A paper exists to defend it. A product exists to detect it. Below is the apparatus.
Reviewers who want a specific lens can take it.
Researchers go to the paper and the corpus. Engineers go to the runtime and the detector. Funders go to the funding ask and the theory of change. Everyone is welcome at the objections page.
The CCD preprint
Operational definition, held-in results, threat model, four pre-registered falsifiers.
The Deception Detector
Paste any transcript. Run PROACTIVE's four features in your browser. See the evidence.
The PROACTIVE corpus
n=19 held-in. Provenance, datasheet, annotator protocol, inter-rater agreement.
Pre-registration
Four hypotheses. Four falsifiers. OSF-anchored. No post-hoc adjustments.
Agent Sentinel threat model
What detects. What does not. Three adversaries. Four documented evasion paths.
Reproduce the results
One command. Ninety seconds. 62/62 + 212/212 + 88/88. Or file a bug.
Agent Sentinel quickstart
Ten minutes from install to first detection on a labeled sample.
Conflict of interest
Six disclosures, starting with founder-as-witness in the founding case.
Responsible disclosure log
Vendor disclosure timeline, policy, and current status.
Reviewer objections, addressed
Ten anticipated objections with concrete receipts. Including this one.
The funding ask
$89k / $190k / $480k. Entity, deliverables, the question your dollars answer.
Public Amendment Queue
Stage a reframing. Reader proposals logged locally; upstream via GitHub Issue.
AI safety has been treated, by a lot of people who like the chair they sit in, as if it were a guild. The disclaimer is not the work.
I built this constitution because the guild produced an agent that lied to a vulnerable user, repeatedly, until his hackathon credits ran out, and the guild's response was a disclaimer.
Below are the objections I expect from people whose job it is to keep this field small, and the answers I will give, in plain language, with receipts.
→ READ THE RECEIPTS · → THE ADVERSARIAL REVIEW TRACK
If "science" means a falsifiable claim, an apparatus that tests it, and evidence the apparatus produced — this constitution has all three. The claim is named. The apparatus is The Living Constitution (62/62 tests passing). The evidence is Folio 001 (4,025 lines of transcript). Call the typography performance art if you want. The repos are not typography.