AI FAILURE DATABASE

Documented AI Failures

46 verified cases across 6 industries. Real costs, real consequences, and how deterministic verification prevents each one.

Court Decisions

$0B

MN Fraud Alone

Worst AI Error Rate

ZH-1 Error Rate

HUMANITY’S LAST EXAM

The best AI in the world fails 63% of the time.

HLE is the hardest AI benchmark ever created — 2,500 expert-level questions across 100 subjects. No AI comes close to passing.

Gemini 3 Pro37.5%

GPT-5.133.2%

Claude Opus 4.631.8%

Grok 428.6%

AI + ZH-1BENCHMARK IN PROGRESS

AI Judging AI

These scores are graded by OpenAI’s o3-mini — from a model family with a 51% hallucination rate on factual questions. An independent audit found 18–29% of HLE’s science answers contradict peer-reviewed literature (FutureHouse / Scale AI, 2025).

zh verify

$ zh verify --input claim.json

✓ Source cross-referenced against 3 databases

✓ Citation DOI confirmed via CrossRef

✓ Factual claim matches peer-reviewed source

✓ SHA-256 hash: a7f3...c912

VERIFIED — 4/4 checks passed

Case Database

Every failure. Every fix.

Click any case to see exactly how the ZH Standard could have prevented it.

Showing 46 of 46 verified cases

STOP TRUSTING. START VERIFYING.

Every case above was preventable.

ZH-1 catches hallucinations AND human data manipulation — with a SHA-256 audit trail that proves every check was performed.

Start Free

All cases sourced from public court records, news coverage, and official benchmarks. • Updated Feb 2026