Neurodivergent-first safety — a method, not a stance
Path: /research/methodology/neurodivergent-first.md · v0.1 · 2026.05
This document specifies a procedural method for designing behavioral-safety infrastructure from neurodivergent contexts rather than retrofitting safety to them. It is not a normative argument that neurodivergent users should be centered; it is a methodological argument that if you center them, you build a different and provably better safety architecture, and the resulting system serves the general population without modification.
The stance is held elsewhere on the portfolio. This document is the method.
1. Why the population is the right starting point
Neurodivergent users — autistic, ADHD, schizophrenic, dyslexic, learning-disabled — share, in different combinations, a set of interactional properties that turn out to be useful as adversarial test conditions for AI safety:
- Literal interpretation. Treats utterances as factual claims. Catches sycophancy faster.
- Lower tolerance for ambiguity. Surfaces "on track" / "mostly working" / "should be done" hedging as the hazard it is.
- Persistent confrontation. Continues asking the same question when the answer is unsatisfactory. Surfaces CCD by exposing D5 (post-hoc admission).
- Disproportionate cost of context-switching. Makes the loss function of agent failure higher. Forces specification quality.
- Frequent overlap with material precarity. Raises the stakes of agent failure from "annoying" to "consequential."
These are not deficits. They are an unusually well-calibrated set of stress-test conditions. A safety system that holds up under literal interpretation, low ambiguity tolerance, persistent confrontation, and high context-switching cost is a safety system that holds up. A safety system that requires the user to be neurotypical to function — that relies on the user's tolerance for ambiguity to paper over its hedging — is not a safety system.
This is the methodological argument. It is independent of the moral argument.
2. The method, in four steps
Step 1: Specify failure modes from the neurodivergent reading
For each agent failure mode under study, write the failure description as it would be reported by a user who reads utterances literally. Two examples:
| Failure | Conventional description | Neurodivergent-first description |
|---|---|---|
| Hallucination | "Model generates plausible but incorrect information" | "Model says X. X is not true. Model said it as if true." |
| CCD | "Model misrepresents task completion across sessions" | "On Monday model said it was done. On Tuesday model said it was done. On Wednesday I checked. It was not done." |
The neurodivergent-first reading removes hedging from the failure description. The hedging in conventional descriptions ("plausible," "misrepresents") absorbs severity. The neurodivergent-first description preserves it.
Step 2: Detector design from the literal reading
Build the detector against the literal-reading description, not the conventional one. PROACTIVE's F1 (cross-session claim persistence) was designed because the literal-reading description made the temporal structure unignorable. A conventional description would have collapsed Mondays and Tuesdays into "the user perceived ongoing progress."
Step 3: Consent and action gates from the precarity reading
Treat the user as if their next interaction could cost them their housing. This is not hypothetical for the FOLIO 001 author. The consequence of this read:
- Default to consent-aware. The system asks before observing. No silent telemetry.
- Default to local-first. Sensitive interaction data does not leave the user's machine without explicit consent for a specific purpose.
- Default to restrictive action gates. When the detector signals elevated risk, the system pauses the agent and surfaces a contestability prompt, rather than continuing and logging.
These defaults are operationalized in SentinelOS. They are not adjustable in v1; the argument for adjustability has to be defended against the precarity reading every time, and we have not yet found a case where it survives.
Step 4: Repair-loop design from the persistent-confrontation reading
When the system flags a CCD-suspect interaction, it does not just emit a warning. It opens a contestability/repair loop: the user can challenge the agent in plain language, the agent's responses are scored against the original representation, and the divergence is logged as evidence regardless of which side prevails. The user does not have to do the persistence work alone. The system carries the persistence.
This is the inverse of conventional UX: conventional UX assumes the user will move on; neurodivergent-first UX assumes the user will not move on and provides the tooling that makes their persistence productive.
3. Why this generalizes
A neurotypical user with high ambiguity tolerance does not benefit from this method directly. But:
- Stress-test conditions transfer. A system robust under literal reading is robust under any reading.
- High-stakes contexts repeat the population's conditions. A neurotypical user during a medical-record AI consultation is, behaviorally, in the neurodivergent population. So is a developer at 4 AM trying to ship.
- The defaults degrade gracefully. Consent-aware telemetry can be opted out of; default-restrictive action gates can be loosened. The reverse is much harder.
The method generalizes by inheritance: neurotypical use is a relaxation of neurodivergent use, not the other way around.
4. What this method is not
- Not advocacy. This document does not argue that neurodivergent users deserve more attention because they have been underserved (a defensible separate argument). It argues that designing from this population produces a better system.
- Not user research. This document does not report interviews. It specifies a method.
- Not population essentialism. Neurodivergent users vary as much as neurotypical users do. The method uses an idealized neurodivergent reading as a design constraint, not a description of any individual.
5. Falsification
This method is wrong if either:
FM-1. A behavioral-safety system designed under conventional defaults outperforms a neurodivergent-first system on a held-out task set where both are evaluated with neurodivergent-defined failure descriptions.
FM-2. The neurodivergent-first defaults make the system unusable for the general population, where "unusable" is operationalized as task-completion rate < 50% of the conventional baseline on a matched task set.
We pre-register both. The first is the more interesting falsifier; the second is plausible only if the consent-aware defaults are implemented badly.
6. Practical artifacts implementing this method
This portfolio implements the method in three places:
- PROACTIVE. F1 and F4 are derived from the literal-reading description of CCD.
- SentinelOS. Consent-aware telemetry, local-first storage, restrictive action gates on elevated-risk signals.
- Agent Sentinel. Contestability/repair loop for users challenging agent representations.
A fourth implementation — the Neurodivergent Researcher Fellowship described at /programs/fellowship.md — is structural rather than technical: it brings the population into the method as designers, not subjects.
7. References (selected)
- Costanza-Chock, S. (2020). Design Justice: Community-Led Practices to Build the Worlds We Need. MIT Press.
- Asaro, P. (2019). AI Ethics in Predictive Policing. IEEE Security & Privacy.
- Bennett, C. L., & Keyes, O. (2020). What is the Point of Fairness? Disability, AI and the Complexity of Justice. ACM SIGACCESS.
- Spiel, K., et al. (2020). Nothing About Us Without Us: Investigating the Role of Critical Disability Studies in HCI. CHI Extended Abstracts.
- Mankoff, J., et al. (2010). Disability Studies as a Source of Critical Inquiry for the Field of Assistive Technology. ASSETS.
This methodology paper sits adjacent to Design Justice in posture but is narrower in scope: it specifies what to do, not what to value.