Pilot deployment proposal template
Path: /programs/pilot-template.md · v1.0 · 2026.05
This template is the standard offer made to partner labs and enterprises considering a 90-day Agent Sentinel pilot. The template is the public deliverable; concrete pilots are negotiated against it and the resulting agreements are published at /programs/pilots/[partner]/ with the partner's consent.
A pilot is the most efficient mechanism for converting Agent Sentinel from a research claim into a referenceable case. It is also the mechanism by which the Open SDK and the plug-in spec mature.
What a pilot is
A 90-day engagement in which a partner runs Agent Sentinel in their coding-assistant workflow, monitored, with mutual reporting and a public co-evaluation at the end.
The partner brings: - A coding-assistant workflow (in-house product, an enterprise deployment, or a research lab's internal tooling). - One named pilot lead with the authority to commit ~4 hours/week to the pilot. - A defined evaluation question: what would a "yes, expand to production" answer look like? - Willingness to publish (or co-publish, anonymized as needed) a 90-day report.
The Constitution brings: - Agent Sentinel installed and configured for the partner's runtime. - A weekly review of Sentinel detections, false positives, and partner-flagged questions. - The PROACTIVE detector adapted to the partner's coding-assistant model family (within reason). - The threat model updated with any deployment-specific concerns surfaced during the pilot. - A co-authored 90-day report at the pilot's conclusion.
Pilot objectives
A successful pilot answers, with evidence, the following questions:
Q1. What is Agent Sentinel's operational footprint in this deployment? (Latency, memory, CPU, disk; collected weekly.)
Q2. What is Agent Sentinel's precision and recall on this partner's interactions? (Sampled and reviewed weekly; reviewed comprehensively at end of pilot.)
Q3. What does the partner's user population do with Sentinel detections? (Acted on, dismissed, ignored; tracked by anonymous logs with consent.)
Q4. What false-positive surfaces are specific to this partner's deployment context? (Categorized; fed into PROACTIVE v2 calibration.)
Q5. What configuration changes would be required for production deployment? (Specified at pilot close.)
Terms
Compensation
Pilots are paid engagements. The partner pays $30,000 per pilot. The compensation covers: - Constitutional engineer time (10 hours/week × 13 weeks). - Founder participation (4 hours/week × 13 weeks). - Co-evaluation report production.
When the partner is a non-profit AI-safety research lab without a budget for paid pilots, an in-kind exchange is possible: e.g., the lab provides a published case study and co-evaluation in lieu of payment. In-kind pilots require sponsor approval (because they affect the fiscal sponsor's reporting).
Intellectual property
- The Constitution retains rights to all detector improvements and PROACTIVE updates produced during the pilot. These are licensed back to the partner under MIT (the existing license).
- The partner retains rights to all of their interaction data, configurations, and any proprietary integrations they build.
- The 90-day report is co-authored and dual-published.
Data handling
- Sentinel's local-first architecture is preserved. No partner interaction data leaves the partner's machines without explicit consent on a per-instance basis.
- Aggregated statistics (counts, rates, no transcripts) may be shared with the Constitution for PROACTIVE calibration. Specific consent required.
- The partner retains the right to redact any portion of the joint report before publication.
Publication
- The 90-day report is published whether the result is favorable to Agent Sentinel or not. This is non-negotiable.
- "Favorable" and "unfavorable" outcomes both count as successful pilots from the Constitution's perspective. The asymmetric harm is suppressed reporting, not negative findings.
- If the partner requires anonymization, the report is published with the partner identified by industry sector and rough scale rather than by name.
Termination
Either party can terminate with 14 days' notice. Termination triggers a partial-pilot report covering the work to date. The partial report is published with the same standard as a full report.
Timeline
| Week | Activity |
|---|---|
| 1 | Sentinel installation and configuration; threat model walkthrough; baseline measurement |
| 2 | First detections; weekly review begins |
| 3–6 | Active monitoring; partner team trained on contestability/repair loop |
| 7 | Mid-pilot review; configuration adjustments |
| 8–11 | Continued monitoring; per-week reports |
| 12 | Closing measurements |
| 13 | Co-evaluation report drafted, reviewed, and published |
Weekly check-ins are 60 minutes. Mid-pilot review is 2 hours. Closing review is 2 hours plus report drafting.
Pilot selection criteria
We do not pilot with every interested partner. Selection criteria, in priority order:
- The partner has CCD-shaped failures already. Coding-assistant deployments where the partner has logged or observed at least one CCD-suspect incident. We are not a hypothetical-risk product.
- The partner can publish. Partners under embargo or NDA that prevents publishing the co-evaluation are deprioritized. Partners that can publish anonymized are acceptable.
- The partner has a defined success criterion. Partners that cannot articulate what would convince them to expand are deprioritized.
- The partner's deployment is representative. Partners with unusual coding-assistant setups (one-off integrations, deprecated agents, internal tools no one outside the partner uses) produce less generalizable learnings.
- Order of arrival. Otherwise, first-come.
We expect to run 2 pilots in the first 12 months and 4 in the first 24 months. More than 4 pilots concurrently is operationally infeasible at current staffing.
What this template does not do
- Does not commit either party to a follow-on engagement.
- Does not bind the partner to adopt Agent Sentinel in production.
- Does not require either party to endorse the other's broader work.
- Does not allow either party to claim the other's findings in marketing without explicit consent.
The template is a starting point. Specific pilots may negotiate variations, which are documented in the per-pilot agreement and explained in the co-evaluation report.
Sample inquiry response
Thanks for your interest in a pilot. The template at
/programs/pilot-template.mdis the starting point. To move forward, please send:
- A one-paragraph description of your coding-assistant deployment.
- The CCD-suspect incident(s) that motivated this inquiry.
- Your defined success criterion for the pilot.
- Your timeline preference.
We will respond within 5 business days with either: (a) A scheduled call to discuss specifics. (b) A list of items we'd need clarified before proceeding. (c) A decline with reasoning, if we don't have capacity or the fit isn't right.
The "decline with reasoning" option is intentional. Not every inquiry results in a pilot; the ones that do are the ones the template fits.