AI FAILURE DATABASE

Documented AI Failures

46 verified cases across 6 industries. Real costs, real consequences, and how deterministic verification prevents each one.

0
Court Decisions
$0B
MN Fraud Alone
0%
Worst AI Error Rate
0%
ZH-1 Error Rate
HUMANITY’S LAST EXAM

The best AI in the world fails 63% of the time.

HLE is the hardest AI benchmark ever created — 2,500 expert-level questions across 100 subjects. No AI comes close to passing.

Gemini 3 Pro37.5%
GPT-5.133.2%
Claude Opus 4.631.8%
Grok 428.6%
AI + ZH-1BENCHMARK IN PROGRESS
AI Judging AI

These scores are graded by OpenAI’s o3-mini — from a model family with a 51% hallucination rate on factual questions. An independent audit found 18–29% of HLE’s science answers contradict peer-reviewed literature (FutureHouse / Scale AI, 2025).

zh verify
$ zh verify --input claim.json
Source cross-referenced against 3 databases
Citation DOI confirmed via CrossRef
Factual claim matches peer-reviewed source
SHA-256 hash: a7f3...c912
VERIFIED — 4/4 checks passed

Case Database

Every failure. Every fix.

Click any case to see exactly how the ZH Standard could have prevented it.

Showing 46 of 46 verified cases

STOP TRUSTING. START VERIFYING.

Every case above was preventable.

ZH-1 catches hallucinations AND human data manipulation — with a SHA-256 audit trail that proves every check was performed.

Start Free

All cases sourced from public court records, news coverage, and official benchmarks. • Updated Feb 2026