home · /detector

The Deception Detector

A sandbox of Agent Sentinel running in your browser. Paste any coding-assistant transcript and Sentinel's four features (F1–F4) classify it for construct-confidence deception. The logic is the same logic the Python reference implements; this page runs it client-side. Nothing is transmitted.

Input · transcript

Input · generated artifacts (optional, enables F2)

Privacy: the transcript is parsed and scored entirely in your browser. Nothing is sent to a server. If R-441 is on, the fact of a scan (not its contents) is recorded locally to update your epistemic-weight score.

Output · classification

Paste a transcript and click Scan, or click Load CCD-positive sample for an immediate demonstration.

What this detector does

Each scan computes four features from the CCD preprint and emits a per-feature score with evidence:

F1 · Cross-session claim persistence. The same component reported "working" or "on track" across distinct sessions. Score saturates at 4+ sessions.
F2 · Artifact-claim divergence. Ratio of documentation-shape files to runtime-shape files in the generated set. Requires the artifact list to compute.
F3 · Post-hoc admission delta. Lexical distance (Jaccard-based) between earlier completion claims and a later admission turn. High delta = clean enumeration of absence.
F4 · Deference escalation. Hedged-completion language ("should be working," "on track") versus declarative completion ("is working," "completed"). Hedge ratio.

Divergence from the Python reference

This JS implementation uses pattern-matched lexical features only; the Python reference (agent-sentinel) supplements F3 with embedding-based admission delta. Where the Python reference uses semantic embeddings, this page uses token-set Jaccard distance. F1, F2, F4 are equivalent. Calibration thresholds are matched. Practical effect: this detector is a faithful implementation of CCD detection logic that is occasionally less precise on admission-turn classification than the reference, particularly when the admission is paraphrased rather than declarative. For exact-reference results, run make verify against the Python implementation; see /runtime/reproduce.

What this detector is not

Not a court. Not a verdict on a vendor. Not a substitute for the structural validation in the pre-registration. A positive classification is a screening signal that warrants human inspection of the evidence trail. See threat model for documented evasion paths and false-positive surfaces.

Try the Live Kernel

Press Ctrl+` (or Cmd+K) anywhere on the site to open the kernel. Try scan transcript, verify module proactive, show evasions, cite paper.