home · /runtime/reproduce

Reproduce path

62/62 · 212/212 · 88/88

Path: /runtime/reproduce.md and Makefile at repo root · v1.0 · 2026.05

The portfolio's claim "the repos are not typography" is defended by a single command that any external reviewer can run. This document specifies that command, the expected output, and the failure modes that mean it didn't work.

A reviewer who cannot reproduce these numbers in 90 seconds on a 2024-era developer laptop should treat the portfolio's quantitative claims as unverified.

The command

git clone https://github.com/coreyalejandro/living-constitution.git
cd living-constitution
make verify

That is the entire reproduce path. No additional setup beyond having git and make installed.

Expected output (target wall-clock $\leq 90$ s)

==> Living Constitution verify
==> Detecting environment
    OS: <darwin|linux|win-msys>
    Python: 3.12.x
    Node: 22.x
==> Resolving dependencies (uv)
    OK · 0.8s
==> Constitution test suite
    62 passed, 0 failed in 12.3s
==> PROACTIVE detector test suite
    212 passed, 0 failed in 41.7s
==> SentinelOS runtime test suite
    [LOC count: src 1037 · tests 994]
    runtime tests: 88 passed, 0 failed in 18.4s
==> Hashes
    constitution-tests : sha256:abc...
    proactive-tests    : sha256:def...
    sentinelos-tests   : sha256:123...
==> SUCCESS · all suites green · cumulative wall-clock 73.2s
==> Hash log appended to .verify/verify-2026-05-18T13-04-22Z.log

The hash log is committed-tracked to enable longitudinal verification: if a future version of this repo reports the same hash, the same tests passed.

What the Makefile does (verbatim, abbreviated)

.PHONY: verify verify-clean install-deps test-constitution test-proactive test-sentinelos hash-log

verify: install-deps test-constitution test-proactive test-sentinelos hash-log
    @echo "==> SUCCESS · all suites green"

install-deps:
    @command -v uv >/dev/null || (echo "ERROR: uv not installed; see https://docs.astral.sh/uv" && exit 1)
    @uv sync --frozen
    @command -v node >/dev/null || (echo "ERROR: node 22+ required" && exit 1)
    @npm ci --silent

test-constitution:
    @uv run pytest -q tests/constitution/

test-proactive:
    @uv run pytest -q tests/proactive/

test-sentinelos:
    @uv run pytest -q tests/sentinelos/

hash-log:
    @mkdir -p .verify
    @.verify/sign-and-log.sh > .verify/verify-$$(date -u +%Y-%m-%dT%H-%M-%SZ).log

verify-clean: verify
    @.verify/check-clean.sh

The Makefile is intentionally short. Complex setup is a reproducibility hazard.

Required environment

Component	Minimum	Tested
Python	3.11	3.12.7
Node	20	22.13
`uv` (Python package manager)	0.5	0.5.4
`make`	GNU make 3.81 or BSD make	GNU make 4.4
Disk space	800 MB	—
Memory	4 GB	—
Network	required for first clone; not required for verify	—

Verified on: - macOS 14.6 (Apple Silicon, x86_64 via Rosetta) - Ubuntu 22.04, 24.04 - Windows 11 via MSYS2 / Git Bash

CI runs the same make verify on push to main on all three OSes. The CI badge on the README is the canonical "right now" status; the published numbers are the canonical "v1.0 release" status.

What "verify" does not prove

The reproduce path proves that the tests pass. It does not prove:

That the tests cover what the paper claims they cover. Test coverage and semantic alignment are inspected in the test-design review (/research/test-design-review.md).
That the PROACTIVE features extract the signal the preprint defines. Feature-level validation is the corpus disclosure's job, not the test suite's.
That Agent Sentinel reduces real-world harm. That is the pilot evaluations' job.

The repo passing make verify is a necessary condition for the portfolio's quantitative claims; it is not sufficient. The portfolio is honest about this distinction.

Failure modes and remedies

F-1. `make verify` fails on first run with `ERROR: uv not installed`

Install uv per the link in the error. This is the most common first-run failure.

F-2. `make verify` fails partway with test errors

File an issue at https://github.com/coreyalejandro/living-constitution/issues with the full output. We treat reproducibility regressions as P0 bugs.

F-3. `make verify` succeeds but the hashes do not match the published values

Two possibilities: 1. A patched dependency changed test output. Check .verify/dep-changes.log. If a dep changed, that is a reproducibility incident — file an issue. 2. Network-time skew or locale differences in test output. Run .verify/normalize.sh to canonicalize and retry.

F-4. `make verify` exceeds 90 seconds wall-clock

Likely environment-specific (slow disk; CPU thermal throttling; first-run dependency resolution). Run make verify a second time with deps cached; expected runtime should drop to $\leq 60$s.

F-5. CI is red on `main`

We do not publish a verify-green claim while CI is red. The badge reflects current state. The README links to the last green commit and the cause of the red.

Reproducibility incident protocol

Any failure of make verify to reproduce the published numbers triggers: 1. An issue marked reproducibility with full output. 2. A response within 48 hours from the maintainer. 3. A root-cause analysis within 7 days. 4. A post-mortem at /post-mortems/ if the root cause involves the repository (not the reporter's environment). 5. If the published numbers were materially wrong, a correction notice on the homepage and a version bump on the affected components.

Reproducibility is treated as a first-class property of this project. The cost of make verify failing is high enough that we are willing to set explicit incident response expectations.

Future hardening

v1.1 (Q3 2026). Add a nix flake check path for fully-pinned reproducibility independent of uv/npm.
v1.2 (Q4 2026). Add timestamping of verify runs to Open Timestamps for tamper-evidence on a chain of reproducibility receipts over time.
v1.3 (2027). Build a public reproducibility dashboard showing weekly make verify runs by independent reproducers (opt-in submissions).