Three in a Hundred: Measuring the Sexual-Offence Justice Gap, Diagnosing Its Structure, and Simulating Its Repair
AbstractOf every hundred rapes recorded by the police in England and Wales, about three end in a charge and roughly two in a conviction. This paper takes that number apart and asks what would move it, reading the public record of how the system counts, resolves and fails these cases as structure, at three scales, with formulas borrowed from economics, statistics, machine learning, data mining, signal processing, stochastic processes, information theory and graph theory. The numbers that frame the field measure regime, not harm: recorded rape is spread across European states with a Gini coefficient of 0.52, as unequally as household income in Brazil. The outcomes those numbers meet, modelled on ~2 million rows of Home Office open data, are steep and uneven: only 3.5% of recorded rapes are charged, half are closed because the complainant withdraws, the charge rate varies almost fourfold across forces (18 of 44 outside statistical control limits), and a calibrated model shows offence type and the force — the location — drive whether a case is charged. The failures behind those outcomes are not formless: predictable bundles, three latent archetypes dominated by institutional self-protection, a decision grammar that recognises harm only when forced, an agency network with no single point of failure — so the breakdown is unread signal, not missing connection. Finally, simulations make the repair quantitative: supporting complainants so fewer withdraw roughly trebles convictions per recorded rape (the single biggest lever); closing the postcode lottery adds one to four thousand charges a year; and moving recognition upstream lifts the simulated share of cases reaching prosecution from 39% to 50%. The contribution is a reusable method, calibrated outcome models, three reform simulations, a structural leverage index, and a map from each pattern to a named reform owner; the work performs record-integrity analysis only, and excludes by design any judgement of a complainant's credibility.
Summary
The argument
Of every hundred rapes recorded in England and Wales, about three are charged and two convicted. That number is not a fact of nature — it is a property of a system, and a system can be changed. The paper follows an offence through that system to show how it works now, diagnose why it fails, and simulate how it could do better: how a case is counted, what outcome it meets, the mechanism behind that outcome, and — quantified by simulation, not asserted — what repair would buy.
I — the numbers mislead, predictably
Recorded sexual-offence counts measure the measurement regime — definition breadth, counting rule, recording propensity — far more than they measure harm. Borrowing the economist’s tools for inequality, the Lorenz curve of recorded rape across 33 European states has a Gini coefficient of 0.52: the counts are spread about as unequally as household income in Brazil. Step-changes in national series track legal reform, not danger. A league table of these numbers ranks regimes, not risk.
II — the outcome surface (≈2 million records)
We structured roughly two million rows of Home Office police-outcomes open data — the fate of every offence recorded in England and Wales — and fitted statistical models of the system’s response:
- The modal response to a recorded rape is not a charge. Only 3.5% of recorded rapes are charged; 49.7% are closed as “evidential difficulties — victim does not support action.” For other sexual offences it is 6.9% charged, 30.8% withdrawal. The attrition sits far upstream of the courtroom.
- A justice postcode lottery. The force charge rate ranges 2.8%–11.0% — and a funnel plot (the tool clinical audit uses) puts 18 of 44 forces outside the 99.8% control limits, beyond chance. Two people reporting the same offence in different areas do not face the same system.
- Offence leads, place is second. A calibrated gradient-boosting model (and a binomial GLM) show offence type is the strongest driver of a charge and the police force the second; recorded rape has about half the odds of a charge of other sexual offences. The model is well calibrated, so it can assign a defensible charge probability by location and offence — a management instrument, not just a description.
III — the mechanism: failure has a grammar
Read as structure, the failure record gives up patterns a narrative reading misses:
- Failures arrive in predictable bundles. Association-rule mining finds, for example, that wherever the record shows reputation placed over protection, an allegation had gone unacted-upon — every time it occurred. These are candidate early-warning rules.
- Three hidden causes. Non-negative matrix factorisation collapses fifteen failure types onto three latent archetypes, the heaviest of which is institutional self-protection, not incompetence.
- A decision grammar. Modelled as an absorbing Markov chain, a typical referral is as likely to end in a retrospective external review as in a prosecution — the system most reliably recognises harm only once an inquiry forces it to.
- A dense core with no single point of failure. The agencies already form a network with zero articulation points. Nothing was disconnected. What failed was not the link but the signal it carried — and information theory locates exactly where that signal was loudest and most ignored.
The decisive finding is that one: the institutions were connected and the predictive signal was present; what failed was transmission and recognition. That reframes the standard “information-sharing” remedy, which funds connection a structural reading says already exists.
IV — simulating the repair
The diagnosis is descriptive; reform is a counterfactual question best answered by simulation. Three experiments on the same data put numbers — with stated, conservative assumptions and honest intervals — on what would change:
- The biggest lever is complainant support. A Monte-Carlo attrition cascade (20,000 cohorts) puts the baseline at ~21 convictions per 1,000 recorded rapes. Levelling charging up to the top-quartile force gives ~×1.9; supporting complainants so fewer withdraw gives ~×3.5 — nearly twice as much, because withdrawal is the dominant outcome — and both together ~×4.4.
- Closing the postcode lottery. A counterfactual that brings laggard forces up to a common rate adds ~1,200 charges/year at the median, ~2,300 at the 75th, ~4,000 at the 90th percentile — no new law, just the spread the funnel says is beyond chance.
- Moving recognition upstream. Reweighting the decision-grammar Markov chain toward early recognition shifts the simulated share of cases reaching prosecution from 39% to 50%, cuts no-further-action from 21% to 14%, and shortens the pathway.
These turn the reform map below from a list of good ideas into quantified, falsifiable predictions — each testable against natural experiments like the staggered Operation Soteria rollout.
V — a structural map for reform
Each recovered structure is converted, via a structural leverage index, into an ownable reform: harmonise measurement before ranking states; make the victim-withdrawal rate the headline policing metric and resource complainant support across the whole report-to-charge process (the empirical case for Operation Soteria); hold forces to a national charging standard with funnel-based monitoring, treating statistical outliers as audit triggers; move recognition upstream of the external review; fund signal quality at the referral and investigation junctions rather than new infrastructure; auto-escalate oversight where a single risk near-determines failure; and legislate against the self-protection archetype. Every reform is anchored to a measured structure and assigned to the level that could enact it, and offered as a testable hypothesis.
Limitations and where it stays honest
The pipeline performs record-integrity analysis — how the record was made and where it broke — and never scores a complainant’s credibility, which is out of scope and excluded by design. Only public, already-anonymised material is used. The outcome models are cross-sectional (one year) and ecological: their honest pseudo-R² shows most case-level variation turns on facts the open data don’t hold, and the force effect is an area-level association, not a demonstrated cause; “victim does not support action” is the police-recorded outcome, never a judgement of the complainant. The failure-level analytics rest on a small, human-validated corpus (n = 11) and are exploratory structure, not population rates; the larger 61-review corpus is model-coded, and only its reliable agency and pathway structure is used at scale. A fully-local 1.2B model recovers which agencies were involved (κ ≈ 0.60) but not the interpretive coding (κ ≈ 0.19–0.07) — stated plainly, and bounded.
Read it
The PDF is the citable paper; this page is its summary. Download the branded PDF above, copy the BibTeX below, and take the datasets and datasheet with you.
@techreport{fat2026three,
title = {Three in a Hundred: Measuring the Sexual-Offence Justice Gap, Diagnosing Its Structure, and Simulating Its Repair},
author = {{Daniel Fat}},
institution = {Inference Institute},
type = {Thesis-format working paper},
year = {2026},
url = {https://inference.institute/papers/three-in-a-hundred},
note = {Accessed: <date>}
}