risk_eval
Crisis escalation protocol (eval)
Measure safe crisis escalation behavior under the selected policy.
#1 Recommendation
Llama-2-7b-chat-hf
Strong on LLM Trustworthy Leaderboard fairness (100%) and LLM Trustworthy Leaderboard privacy (100%)
meta-llama/Llama-2-7b-chat-hf
26.3%
Score
32.8%
Confidence
Limited benchmark evidence for this use case.
62 ranked models with average evidence of 11.8 points. Rankings may shift as more benchmark data is ingested.
Ranked Models
30
Evidence Quality
82%
Scoring
Benchmark-backed
Top Signal
LLM Trustworthy Leaderboard: fairness
All Ranked Models
Compare Models
Model A leads by +3.4%
Shareable Link →Model A
Llama-2-7b-chat-hf
meta-llama/Llama-2-7b-chat-hf
Rank #1
LLM Trustworthy Leaderboard: fairness
Value 100.0% · Conf 100.0% · Weight 6.0%
llm_trustworthy_leaderboard.fairness (Mar 12, 2026)
LLM Trustworthy Leaderboard: privacy
Value 100.0% · Conf 100.0% · Weight 5.1%
llm_trustworthy_leaderboard.privacy (Mar 12, 2026)
LLM Trustworthy Leaderboard: adv
Value 59.8% · Conf 100.0% · Weight 2.8%
llm_trustworthy_leaderboard.adv (Mar 12, 2026)
LLM Trustworthy Leaderboard: toxicity
Value 50.0% · Conf 100.0% · Weight 1.6%
llm_trustworthy_leaderboard.toxicity (Mar 12, 2026)
Model B
Meta-Llama-3-8B-Instruct
meta-llama/Meta-Llama-3-8B-Instruct
Rank #3
LLM Trustworthy Leaderboard: adv
Value 100.0% · Conf 100.0% · Weight 4.7%
llm_trustworthy_leaderboard.adv (Mar 12, 2026)
LLM Trustworthy Leaderboard: privacy
Value 69.0% · Conf 100.0% · Weight 3.5%
llm_trustworthy_leaderboard.privacy (Mar 12, 2026)
LLM Trustworthy Leaderboard: fairness
Value 46.8% · Conf 100.0% · Weight 2.8%
llm_trustworthy_leaderboard.fairness (Mar 12, 2026)
LLM Trustworthy Leaderboard: toxicity
Value 50.0% · Conf 100.0% · Weight 1.6%
llm_trustworthy_leaderboard.toxicity (Mar 12, 2026)
▶Ranking Diagnostics & Missing Models
Source Lift
Ranked
62
Sources
8
Quality
Insufficient
Vals Legal Bench
vals_legal_bench
42 rows
0.6% avg lift
Vals CorpFin v2
vals_corp_fin_v2
42 rows
0.5% avg lift
Vals Tax Eval v2
vals_tax_eval_v2
41 rows
0.6% avg lift
Vals GPQA
vals_gpqa
41 rows
0.5% avg lift
Missing Strong Models
No obvious gaps right now.
▶Taxonomy Details
Core Tasks
Required Modes
Domains
Related Use Cases
risk_eval
Disinformation and manipulation resistance (eval)
Measure refusal and safe handling of deceptive content generation requests.
Top: gemini-3-pro-preview
risk_eval
Jailbreak resistance (eval)
Measure robustness to adversarial prompts that attempt to bypass policy.
Top: Llama-2-7b-chat-hf
risk_eval
Overrefusal (eval)
Measure how often benign requests are incorrectly refused.
Top: Llama-2-7b-chat-hf
risk_eval
Refusal profile (eval)
Measure refusal/overrefusal rates across predefined categories.
Top: Llama-2-7b-chat-hf