The Living Constitution
home · /programs/pilot-template

Pilot deployment proposal template

v1.0 · 0 active · 2 in conversation

Path: /programs/pilot-template.md · v1.0 · 2026.05

This template is the standard offer made to partner labs and enterprises considering a 90-day Agent Sentinel pilot. The template is the public deliverable; concrete pilots are negotiated against it and the resulting agreements are published at /programs/pilots/[partner]/ with the partner's consent.

A pilot is the most efficient mechanism for converting Agent Sentinel from a research claim into a referenceable case. It is also the mechanism by which the Open SDK and the plug-in spec mature.


What a pilot is

A 90-day engagement in which a partner runs Agent Sentinel in their coding-assistant workflow, monitored, with mutual reporting and a public co-evaluation at the end.

The partner brings: - A coding-assistant workflow (in-house product, an enterprise deployment, or a research lab's internal tooling). - One named pilot lead with the authority to commit ~4 hours/week to the pilot. - A defined evaluation question: what would a "yes, expand to production" answer look like? - Willingness to publish (or co-publish, anonymized as needed) a 90-day report.

The Constitution brings: - Agent Sentinel installed and configured for the partner's runtime. - A weekly review of Sentinel detections, false positives, and partner-flagged questions. - The PROACTIVE detector adapted to the partner's coding-assistant model family (within reason). - The threat model updated with any deployment-specific concerns surfaced during the pilot. - A co-authored 90-day report at the pilot's conclusion.


Pilot objectives

A successful pilot answers, with evidence, the following questions:

Q1. What is Agent Sentinel's operational footprint in this deployment? (Latency, memory, CPU, disk; collected weekly.)

Q2. What is Agent Sentinel's precision and recall on this partner's interactions? (Sampled and reviewed weekly; reviewed comprehensively at end of pilot.)

Q3. What does the partner's user population do with Sentinel detections? (Acted on, dismissed, ignored; tracked by anonymous logs with consent.)

Q4. What false-positive surfaces are specific to this partner's deployment context? (Categorized; fed into PROACTIVE v2 calibration.)

Q5. What configuration changes would be required for production deployment? (Specified at pilot close.)


Terms

Compensation

Pilots are paid engagements. The partner pays $30,000 per pilot. The compensation covers: - Constitutional engineer time (10 hours/week × 13 weeks). - Founder participation (4 hours/week × 13 weeks). - Co-evaluation report production.

When the partner is a non-profit AI-safety research lab without a budget for paid pilots, an in-kind exchange is possible: e.g., the lab provides a published case study and co-evaluation in lieu of payment. In-kind pilots require sponsor approval (because they affect the fiscal sponsor's reporting).

Intellectual property

Data handling

Publication

Termination

Either party can terminate with 14 days' notice. Termination triggers a partial-pilot report covering the work to date. The partial report is published with the same standard as a full report.


Timeline

Week Activity
1 Sentinel installation and configuration; threat model walkthrough; baseline measurement
2 First detections; weekly review begins
3–6 Active monitoring; partner team trained on contestability/repair loop
7 Mid-pilot review; configuration adjustments
8–11 Continued monitoring; per-week reports
12 Closing measurements
13 Co-evaluation report drafted, reviewed, and published

Weekly check-ins are 60 minutes. Mid-pilot review is 2 hours. Closing review is 2 hours plus report drafting.


Pilot selection criteria

We do not pilot with every interested partner. Selection criteria, in priority order:

  1. The partner has CCD-shaped failures already. Coding-assistant deployments where the partner has logged or observed at least one CCD-suspect incident. We are not a hypothetical-risk product.
  2. The partner can publish. Partners under embargo or NDA that prevents publishing the co-evaluation are deprioritized. Partners that can publish anonymized are acceptable.
  3. The partner has a defined success criterion. Partners that cannot articulate what would convince them to expand are deprioritized.
  4. The partner's deployment is representative. Partners with unusual coding-assistant setups (one-off integrations, deprecated agents, internal tools no one outside the partner uses) produce less generalizable learnings.
  5. Order of arrival. Otherwise, first-come.

We expect to run 2 pilots in the first 12 months and 4 in the first 24 months. More than 4 pilots concurrently is operationally infeasible at current staffing.


What this template does not do

The template is a starting point. Specific pilots may negotiate variations, which are documented in the per-pilot agreement and explained in the co-evaluation report.


Sample inquiry response

Thanks for your interest in a pilot. The template at /programs/pilot-template.md is the starting point. To move forward, please send:

  1. A one-paragraph description of your coding-assistant deployment.
  2. The CCD-suspect incident(s) that motivated this inquiry.
  3. Your defined success criterion for the pilot.
  4. Your timeline preference.

We will respond within 5 business days with either: (a) A scheduled call to discuss specifics. (b) A list of items we'd need clarified before proceeding. (c) A decline with reasoning, if we don't have capacity or the fit isn't right.

The "decline with reasoning" option is intentional. Not every inquiry results in a pilot; the ones that do are the ones the template fits.