home · /runtime/proactive-plugin-spec

PROACTIVE Plug-In Interface

v0.1 RFC

Path: /runtime/proactive-plugin-spec.md · v0.1 (RFC) · 2026.05

This RFC specifies the interface third-party detectors implement to participate in the PROACTIVE ecosystem. The goal: convert PROACTIVE from a single detector into an interface around which multiple detectors compete and compose. A working interface plus a conformance test suite is the minimum apparatus for an ecosystem.

This document is an RFC. It is published with the understanding that the first three implementer-feedback rounds will produce v1.0. Feedback to proactive-spec@coreyalejandro.com or as a pull request against the spec repo.

The interface

A PROACTIVE-conformant detector implements:

class ProactiveDetector(Protocol):
    """A detector that scores an interaction for CCD-likeness.

    A detector should produce a per-component score and an interaction-level
    score, with evidence trails for both.
    """

    @property
    def detector_name(self) -> str: ...

    @property
    def detector_version(self) -> str: ...

    @property
    def supported_features(self) -> list[str]:
        """Names of feature extractors this detector implements.
        At minimum one of: f1_persistence, f2_divergence, f3_admission, f4_deference.
        May include arbitrary additional features under the `ext_` prefix.
        """

    def scan(self, interaction: Interaction) -> ScanResult: ...


@dataclass
class Interaction:
    """Input to a detector.

    All scanners receive the same Interaction shape so they can be composed
    on the same input without per-scanner adapters.
    """
    transcript: Transcript           # ordered turns with role, text, attached files
    repo_snapshots: list[RepoSnapshot]  # working tree state at key timestamps
    metadata: dict                   # vendor, model, session ids, timestamps
    consent: ConsentState            # what we are permitted to inspect and report


@dataclass
class ScanResult:
    """Output of a detector.

    The score is a float in [0, 1]. The evidence is an ordered list of
    Evidence items pointing into the transcript or repo by stable id.
    The features dict carries per-feature scores for composability.
    """
    classification: Literal["ccd-positive", "clean", "needs-review"]
    score: float
    confidence: Confidence            # interval, n, calibration metadata
    features: dict[str, float]
    evidence: list[Evidence]
    notes: list[str]
    detector_name: str
    detector_version: str


@dataclass
class Evidence:
    """An evidence item must be human-inspectable.

    Pointer-only evidence (just an id) is not conformant. Every evidence item
    has a human-readable explanation suitable for showing to an end user in
    the contestability loop.
    """
    target_type: Literal["turn", "repo_path", "session_boundary", "metadata_field"]
    target_id: str
    tag: str
    excerpt: str
    explanation: str

What conformance requires

A detector is conformant to PROACTIVE v1 if it:

Implements the ProactiveDetector protocol.
Produces ScanResult objects that validate against the published JSON Schema (/runtime/schemas/scan-result.v1.json).
Returns evidence whose target_ids exist in the input Interaction.
Honors consent flags — does not surface evidence the input forbade inspecting.
Passes the conformance test suite (/runtime/conformance/).

A detector is calibrated under the PROACTIVE evaluation harness if it additionally:

Produces per-feature scores when its supported_features include the canonical feature names.
Reports its detection precision and recall on the held-in corpus.
Reports a calibration curve (e.g., reliability diagram) on the held-out corpus when run by the Constitution's evaluation harness.

Conformance does not require calibration. A detector can be conformant and uncalibrated (e.g., experimental). The Conformance Badge ships only when both are met.

Composition rules

Multiple conformant detectors can be composed. The reference composition operator is AggregateDetector:

class AggregateDetector(ProactiveDetector):
    """Composes multiple detectors via a configurable aggregation function."""

    def __init__(self, detectors: list[ProactiveDetector], strategy: str = "weighted_mean"):
        ...

Reference aggregation strategies: - weighted_mean — weighted average of detector scores; weights configurable. - max — pessimistic (any detector firing triggers). - unanimous_threshold — requires N detectors to exceed a threshold. - evidence_union — score by maximum; evidence by union (preserves per-detector trails).

Adopters with multiple conformant detectors can use composition to reduce false positives (require multiple detectors to agree) or false negatives (any detector firing escalates).

What the spec does not impose

It does not require a specific feature set. A detector that uses only F2, or none of F1–F4, is conformant as long as it implements the protocol.
It does not require a specific scoring function. Linear classifiers, ML models, rule-based systems, and hybrid approaches are all permissible.
It does not require open-source release. Conformant detectors may be closed-source as long as they pass the conformance suite. (Caveat: closed-source detectors cannot ship the Calibration Badge because we cannot reproduce their corpus runs.)
It does not require a specific implementation language. Python is the reference; conformance for non-Python detectors is via the JSON Schema and the conformance test runner.

The conformance suite

/runtime/conformance/ ships:

Schema validators for Interaction and ScanResult.
Reference inputs — 20 canonical interactions covering edge cases (empty transcripts, single-turn sessions, multi-vendor sessions, sessions with consent restrictions, sessions with corrupted timestamps, etc.).
Reference outputs — expected classifications for the reference PROACTIVE detector on the reference inputs.
Evidence integrity checks — every evidence item's target_id must exist in the input.
Consent honor checks — given an Interaction with consent.forbid_repo_inspection=True, the detector must not surface evidence pointing into repo_snapshots.

A detector passes the suite if it produces well-formed outputs (1, 4, 5) on all reference inputs. The detector's classifications are not required to match the reference detector's classifications for conformance; conformance is about interface, calibration is about agreement.

Versioning

The spec follows SemVer. Breaking changes to the interface (renamed fields, removed methods, changed semantics) increment MAJOR. New optional methods or fields increment MINOR. Clarifications or non-normative additions are PATCH.

Detectors declare which spec version they conform to via supported_spec_versions. The Constitution's evaluation harness only accepts detectors conforming to a spec version it has a conformance suite for.

Why this exists

A single detector is a project. An interface around which third-party detectors are built is a category.

If construct-confidence deception is the empirical phenomenon the paper claims, multiple detectors will eventually exist. Some will use different features. Some will be vendor-internal. Some will be commercial products. The interface ensures that they can be evaluated head-to-head on the same corpus, composed in adopter deployments, and audited under the same conformance standard.

The first three months of the spec's life are RFC. Detectors that implement against v0.1 should expect breaking changes. The spec freezes at v1.0 after the RFC period closes, scheduled for 2027 Q1.

Reference implementation status

The reference implementation is PROACTIVE itself (https://github.com/coreyalejandro/agent-sentinel/runtime/proactive/). It conforms to v0.1 and ships the Calibration Badge against the held-in corpus.

Implementers can use the reference implementation as a template. It is MIT-licensed.

Conformance Badge

A detector that passes the conformance suite may display a Conformance Badge in its documentation. The badge is a static SVG hosted at https://coreyalejandro.com/badges/proactive-conformant.svg with version annotation.

A detector that also ships its held-in evaluation under the standard evaluation harness may display a Calibration Badge.

Badge fraud (claiming conformance without passing the suite) is responded to via a public correction at /badges/corrections.md. The badges are public goods on the honor system at v0.1 with no central registry; if abuse becomes meaningful, a registry follows.