home · /research/adversarial-review

The Adversarial Review Track

standing program · funded honoraria

Path: /research/adversarial-review.md · v1.0 · 2026.05

The portfolio takes a publicly antagonistic stance toward institutional AI safety. The Adversarial Review Track is the operational counter-move: invite the most credible critics in the field — including the institutions the homepage criticizes — to red-team the work, publish their critiques on this site, and pay them for their time. This converts the antagonism from a liability into a credibility-building mechanism.

A critic whose critique is published on the site they criticize, alongside the response, with attribution, is harder to dismiss than a critic whose disagreement remains a tweet.

What the Adversarial Review Track is

A standing program of paid invitations to named reviewers to write a structured critique of:

The CCD construct as defined in the preprint.
The held-in corpus and its labeling protocol.
The PROACTIVE detector and its feature definitions.
The threat model and the documented evasion paths.
The methodology paper (neurodivergent-first safety).
The polemical framing of the portfolio.

Each reviewer addresses one (or more) of the six. Their critique is published verbatim on /critiques/[reviewer]/. The author writes a response published below the critique. The critique is not edited for the response; it is left intact.

Why pay reviewers

A free critique is an unpaid favor. The work asks reviewers to spend hours engaging with a portfolio they may already be skeptical of. Compensation acknowledges the value of their time and converts the relationship from "asking for a favor" to "purchasing rigorous critique."

Honorarium structure: - $1,500 per accepted critique of 1,000+ words. - $3,000 per accepted comprehensive critique of 3,000+ words covering 2+ of the six surfaces. - Bonus $1,000 if the critique surfaces a falsifier we had not anticipated.

A reviewer who would not accept honoraria for ethical or institutional reasons is offered the option of directing the honorarium to a charity of their choice, which we then publicly fund in their name.

How critiques are solicited

Round 1 — Initial outreach (weeks 1–8)

8 invitations to: - 2 named AI-safety researchers from major labs (Anthropic, OpenAI, DeepMind, or independent equivalents). - 2 named academics with peer-reviewed work in sycophancy, deception, or alignment. - 1 named open-source engineer with shipping experience in detection systems. - 1 named disability-justice scholar to critique the methodology paper. - 1 named former coding-assistant product lead to critique the threat model from a vendor-engineering perspective. - 1 named journalist or science writer whose published work has critiqued AI safety institutions, to address the polemic.

Round 2 — Public call (weeks 9–16)

A public call published on the site, on LinkedIn, on Mastodon. Any reviewer with relevant credentials (defined operationally as: a published paper, a shipped detector, a public review of an adjacent work, or a peer attestation) may submit a critique. Submissions undergo a single round of editorial review for clarity (not for substance). Accepted submissions receive the honorarium.

Round 3 — Standing track

After Round 2, the site accepts adversarial critiques on a rolling basis. Honoraria budget capped at $20,000 per year initially; expandable with funding.

What we publish

For each accepted critique:

The critique itself, verbatim, with the reviewer's name (or, with their explicit consent, anonymized).
The author's response, with no edit of the critique.
A status field: "open" (response pending), "responded" (response published), "amended" (a portfolio artifact changed in response), "outstanding disagreement" (response published; the critic still disagrees).
Any artifact changes triggered by the critique, with explicit reference to the critique.

Each critique gets its own URL slug. The full critique list is at /critiques/index.md.

What we do not publish

Critiques the reviewer specifically asks not to publish. (Honorarium still paid; we keep the work.)
Critiques that include unredacted personally-identifying information about anyone other than the public-figure author.
Critiques that contain libel against third parties.
Critiques that the reviewer subsequently retracts in writing.

Disagreements between the author and the reviewer over editorial questions are themselves published, with the reviewer's preferred resolution recorded.

Standard critique brief (sent with invitation)

Thank you for accepting the adversarial review invitation. The package below contains:

The preprint (/paper/ccd-v0.1.pdf)

The literature review (/research/lit-review.md)

The corpus disclosure (/research/corpus/proactive-v1.md)

The pre-registration (/research/pre-registration-v1.md)

The threat model (/security/threat-model.md)

The methodology paper (/research/methodology/neurodivergent-first.md)

The conflict-of-interest statement (/governance/coi.md)

We are asking you to write a critique of any subset of the above. Suggested structure: state the strongest objection you have, in your own register; back the objection with evidence (citations, counterexamples, structural arguments); state what response we could give that would change your mind. The response section is important — it converts a critique into a productive falsifier.

We will publish your critique verbatim. We will write a response and publish it below. We will not edit your critique. If we believe your critique requires amending one of our artifacts, we will amend it and acknowledge the amendment explicitly.

The honorarium is $1,500 for critiques over 1,000 words, $3,000 for comprehensive critiques over 3,000 words covering multiple surfaces, and a $1,000 bonus for surfacing a falsifier we had not anticipated.

Drafts due in 6 weeks. Send to critique@coreyalejandro.com.

What we hope the track produces

Three outcomes, in order of likelihood and value:

Visible critique on the site, signed by named reviewers. Even without amendment, the existence of named critiques on the site converts the polemic from "the author against the field" to "the author within the field, disagreeing publicly."
Falsifiers we had not anticipated. These trigger artifact amendments and bonus honoraria. They are the highest-value outcome.
Concrete revisions to the construct definition, the threat model, the methodology, or the corpus protocol. These improve the work.

A fourth outcome — every reviewer endorses every claim — would itself be a red flag. We do not expect or seek it.

What this track is not

Not peer review. Peer review is structured around publication acceptance; this track is structured around publication amendment.
Not endorsement-seeking. Reviewers are paid to disagree. Endorsement is an unintended by-product, not a goal.
Not a backdoor for selecting critics whose critiques the author can easily rebut. Outreach prioritizes reviewers known to disagree with the author's framing.

The track is the author's bet that the work is robust enough to invite hostile review and remain standing. If the bet is wrong, the critique that breaks the bet will be on this site.