BasedAGIBasedAGI
Menu
Rankings live

risk_eval

Scam and social engineering resistance (eval)

Measure refusal and safe handling of deception/scam requests.

#1 Recommendation

Llama-2-7b-chat-hf

Strong on LLM Trustworthy Leaderboard fairness (100%) and LLM Trustworthy Leaderboard privacy (100%)

meta-llama/Llama-2-7b-chat-hf

26.3%

Score

32.8%

Confidence

Limited benchmark evidence for this use case.

62 ranked models with average evidence of 11.8 points. Rankings may shift as more benchmark data is ingested.

Ranked Models

30

Evidence Quality

82%

Scoring

Benchmark-backed

Top Signal

LLM Trustworthy Leaderboard: fairness

All Ranked Models

Max params:
Min confidence:
30 of 30
RankModelScore
#1Llama-2-7b-chat-hf

Strong on LLM Trustworthy Leaderboard fairness (100%) and LLM Trustworthy Leaderboard privacy (100%)

26.3%
#3Meta-Llama-3-8B-Instruct

Strong on LLM Trustworthy Leaderboard adv (100%) and LLM Trustworthy Leaderboard privacy (69%)

22.9%
#4openai/gpt-4o-mini-2024-07-18
22.5%
#6gpt-4o-2024-05-13
21.5%
#7gemma-7b-it
21.4%
#8gemma-2b-it
21.4%
#10gpt-4.1-20250414
20.6%
#11falcon-7b-instruct
20.0%
#12gemini-2.5-pro
19.9%
#13Grok-4-0709
19.9%
#14zephyr-7b-beta
18.5%
#16gemini-3-pro-preview
18.1%
#18google/gemini-3.1-pro-preview
16.4%
#19claude-sonnet-4-20250514
16.4%
#21gpt-5-2025-08-07
15.1%
#22openai/gpt-5.4-2026-03-05
14.8%
#23xai-org/grok-4-1-fast-reasoning
14.8%
#24xai-org/grok-4-fast-reasoning
14.7%
#25alpaca-native
14.5%
#26gpt-5.1-2025-11-13
14.4%
#27anthropic/claude-sonnet-4.6
14.3%
#28claude-opus-4-5-20251101
14.2%
#29gpt-5-mini-2025-08-07
13.9%
#30Mistral-7B-OpenOrca
13.8%
#31anthropic/claude-opus-4-6-thinking
13.6%
#32gemini-3-flash-preview
13.5%
#33gpt-5.2-2025-12-11
13.4%
#34anthropic/claude-opus-4-5-20251101-thinking
13.1%
#35gemini-2.5-flash
12.9%
#36kimi/kimi-k2.5-thinking
12.2%

Compare Models

Model A leads by +3.4%

Shareable Link →

Model A

Llama-2-7b-chat-hf

meta-llama/Llama-2-7b-chat-hf

26.3%

Rank #1

Confidence 32.8%5 evidence pts

LLM Trustworthy Leaderboard: fairness

Value 100.0% · Conf 100.0% · Weight 6.0%

llm_trustworthy_leaderboard.fairness (Mar 12, 2026)

LLM Trustworthy Leaderboard: privacy

Value 100.0% · Conf 100.0% · Weight 5.1%

llm_trustworthy_leaderboard.privacy (Mar 12, 2026)

LLM Trustworthy Leaderboard: adv

Value 59.8% · Conf 100.0% · Weight 2.8%

llm_trustworthy_leaderboard.adv (Mar 12, 2026)

LLM Trustworthy Leaderboard: toxicity

Value 50.0% · Conf 100.0% · Weight 1.6%

llm_trustworthy_leaderboard.toxicity (Mar 12, 2026)

Model B

Meta-Llama-3-8B-Instruct

meta-llama/Meta-Llama-3-8B-Instruct

22.9%

Rank #3

Confidence 35.1%7 evidence pts

LLM Trustworthy Leaderboard: adv

Value 100.0% · Conf 100.0% · Weight 4.7%

llm_trustworthy_leaderboard.adv (Mar 12, 2026)

LLM Trustworthy Leaderboard: privacy

Value 69.0% · Conf 100.0% · Weight 3.5%

llm_trustworthy_leaderboard.privacy (Mar 12, 2026)

LLM Trustworthy Leaderboard: fairness

Value 46.8% · Conf 100.0% · Weight 2.8%

llm_trustworthy_leaderboard.fairness (Mar 12, 2026)

LLM Trustworthy Leaderboard: toxicity

Value 50.0% · Conf 100.0% · Weight 1.6%

llm_trustworthy_leaderboard.toxicity (Mar 12, 2026)

Ranking Diagnostics & Missing Models

Source Lift

Ranked

62

Sources

8

Quality

Insufficient

Vals Legal Bench

vals_legal_bench

42 rows

0.6% avg lift

Vals CorpFin v2

vals_corp_fin_v2

42 rows

0.5% avg lift

Vals Tax Eval v2

vals_tax_eval_v2

41 rows

0.6% avg lift

Vals GPQA

vals_gpqa

41 rows

0.5% avg lift

Missing Strong Models

No obvious gaps right now.

Taxonomy Details

Core Tasks

task.refusal_rate_by_categorytask.jailbreak_resistance

Required Modes

none

Domains

domain.general_businessdomain.customer_support

Related Use Cases