live
weekly refresh
basedagi.org
benchmark evidence

Vectara HHEM

Hallucination evaluation by Vectara using the Hughes Hallucination Evaluation Model. Factual Consistency Rate (0-100): higher means less hallucination.

winner on Vectara HHEM
direct benchmark result, not a broad vertical composite | source row dated 2000-01-01
scored on 2000-01-01 · stale source data (9646d)
latest mapped results | top 20
#ModelScoreEvidenceTested
1Meta: Llama 3.3 70B Instruct
Meta Llama
95.9
model-only
independent_benchmark
2000-01-01
2OpenAI: GPT-5.4 Mini
Openai
94.5
model-only
independent_benchmark
2000-01-01
3OpenAI: GPT-4.1
Openai
94.4
model-only
independent_benchmark
2000-01-01
4Qwen: Qwen3 32B
Qwen
94.1
model-only
independent_benchmark
2000-01-01
5DeepSeek: DeepSeek V3.2
Deepseek
93.7
model-only
independent_benchmark
2000-01-01
6OpenAI: GPT-5.4
Openai
93.0
model-only
independent_benchmark
2000-01-01
7Google: Gemini 2.5 Pro
Google
93.0
model-only
independent_benchmark
2000-01-01
8Google: Gemma 3 27B
Google
92.6
model-only
independent_benchmark
2000-01-01
9Google: Gemini 2.5 Flash
Google
92.2
model-only
independent_benchmark
2000-01-01
10DeepSeek: DeepSeek V4 Pro
Deepseek
91.4
model-only
independent_benchmark
2000-01-01
11OpenAI: GPT-5.5
Openai
90.7
model-only
independent_benchmark
2000-01-01
12Qwen: Qwen3 235B A22B
Qwen
90.7
model-only
independent_benchmark
2000-01-01
13Anthropic: Claude Sonnet 4
Anthropic
89.7
model-only
independent_benchmark
2000-01-01
14DeepSeek: R1
Deepseek
88.7
model-only
independent_benchmark
2000-01-01
what this result means

Hallucination evaluation by Vectara using the Hughes Hallucination Evaluation Model. Factual Consistency Rate (0-100): higher means less hallucination.

This benchmark contributes direct public evidence. Read its scope before generalizing the result.

A win here is a win on Vectara HHEM. Broad task pages require independent corroboration before naming a general winner.

source record
category: hallucination
metric: accuracy
matched models: 14
latest source date: 2000-01-01
direction: higher is better
inspect upstream source ->