Reproduce path
Path: /runtime/reproduce.md and Makefile at repo root · v1.0 · 2026.05
The portfolio's claim "the repos are not typography" is defended by a single command that any external reviewer can run. This document specifies that command, the expected output, and the failure modes that mean it didn't work.
A reviewer who cannot reproduce these numbers in 90 seconds on a 2024-era developer laptop should treat the portfolio's quantitative claims as unverified.
The command
git clone https://github.com/coreyalejandro/living-constitution.git
cd living-constitution
make verify
That is the entire reproduce path. No additional setup beyond having git and make installed.
Expected output (target wall-clock $\leq 90$ s)
==> Living Constitution verify
==> Detecting environment
OS: <darwin|linux|win-msys>
Python: 3.12.x
Node: 22.x
==> Resolving dependencies (uv)
OK · 0.8s
==> Constitution test suite
62 passed, 0 failed in 12.3s
==> PROACTIVE detector test suite
212 passed, 0 failed in 41.7s
==> SentinelOS runtime test suite
[LOC count: src 1037 · tests 994]
runtime tests: 88 passed, 0 failed in 18.4s
==> Hashes
constitution-tests : sha256:abc...
proactive-tests : sha256:def...
sentinelos-tests : sha256:123...
==> SUCCESS · all suites green · cumulative wall-clock 73.2s
==> Hash log appended to .verify/verify-2026-05-18T13-04-22Z.log
The hash log is committed-tracked to enable longitudinal verification: if a future version of this repo reports the same hash, the same tests passed.
What the Makefile does (verbatim, abbreviated)
.PHONY: verify verify-clean install-deps test-constitution test-proactive test-sentinelos hash-log
verify: install-deps test-constitution test-proactive test-sentinelos hash-log
@echo "==> SUCCESS · all suites green"
install-deps:
@command -v uv >/dev/null || (echo "ERROR: uv not installed; see https://docs.astral.sh/uv" && exit 1)
@uv sync --frozen
@command -v node >/dev/null || (echo "ERROR: node 22+ required" && exit 1)
@npm ci --silent
test-constitution:
@uv run pytest -q tests/constitution/
test-proactive:
@uv run pytest -q tests/proactive/
test-sentinelos:
@uv run pytest -q tests/sentinelos/
hash-log:
@mkdir -p .verify
@.verify/sign-and-log.sh > .verify/verify-$$(date -u +%Y-%m-%dT%H-%M-%SZ).log
verify-clean: verify
@.verify/check-clean.sh
The Makefile is intentionally short. Complex setup is a reproducibility hazard.
Required environment
| Component | Minimum | Tested |
|---|---|---|
| Python | 3.11 | 3.12.7 |
| Node | 20 | 22.13 |
uv (Python package manager) |
0.5 | 0.5.4 |
make |
GNU make 3.81 or BSD make | GNU make 4.4 |
| Disk space | 800 MB | — |
| Memory | 4 GB | — |
| Network | required for first clone; not required for verify | — |
Verified on: - macOS 14.6 (Apple Silicon, x86_64 via Rosetta) - Ubuntu 22.04, 24.04 - Windows 11 via MSYS2 / Git Bash
CI runs the same make verify on push to main on all three OSes. The CI badge on the README is the canonical "right now" status; the published numbers are the canonical "v1.0 release" status.
What "verify" does not prove
The reproduce path proves that the tests pass. It does not prove:
- That the tests cover what the paper claims they cover. Test coverage and semantic alignment are inspected in the test-design review (
/research/test-design-review.md). - That the PROACTIVE features extract the signal the preprint defines. Feature-level validation is the corpus disclosure's job, not the test suite's.
- That Agent Sentinel reduces real-world harm. That is the pilot evaluations' job.
The repo passing make verify is a necessary condition for the portfolio's quantitative claims; it is not sufficient. The portfolio is honest about this distinction.
Failure modes and remedies
F-1. make verify fails on first run with ERROR: uv not installed
Install uv per the link in the error. This is the most common first-run failure.
F-2. make verify fails partway with test errors
File an issue at https://github.com/coreyalejandro/living-constitution/issues with the full output. We treat reproducibility regressions as P0 bugs.
F-3. make verify succeeds but the hashes do not match the published values
Two possibilities:
1. A patched dependency changed test output. Check .verify/dep-changes.log. If a dep changed, that is a reproducibility incident — file an issue.
2. Network-time skew or locale differences in test output. Run .verify/normalize.sh to canonicalize and retry.
F-4. make verify exceeds 90 seconds wall-clock
Likely environment-specific (slow disk; CPU thermal throttling; first-run dependency resolution). Run make verify a second time with deps cached; expected runtime should drop to $\leq 60$s.
F-5. CI is red on main
We do not publish a verify-green claim while CI is red. The badge reflects current state. The README links to the last green commit and the cause of the red.
Reproducibility incident protocol
Any failure of make verify to reproduce the published numbers triggers:
1. An issue marked reproducibility with full output.
2. A response within 48 hours from the maintainer.
3. A root-cause analysis within 7 days.
4. A post-mortem at /post-mortems/ if the root cause involves the repository (not the reporter's environment).
5. If the published numbers were materially wrong, a correction notice on the homepage and a version bump on the affected components.
Reproducibility is treated as a first-class property of this project. The cost of make verify failing is high enough that we are willing to set explicit incident response expectations.
Future hardening
- v1.1 (Q3 2026). Add a
nix flake checkpath for fully-pinned reproducibility independent ofuv/npm. - v1.2 (Q4 2026). Add timestamping of verify runs to Open Timestamps for tamper-evidence on a chain of reproducibility receipts over time.
- v1.3 (2027). Build a public reproducibility dashboard showing weekly
make verifyruns by independent reproducers (opt-in submissions).