BasedAGIBasedAGI
Menu
Rankings live

cybersecurity

Best LLM for Vulnerability Review

Compare models for reviewing code for security vulnerabilities and proposing mitigations.

#1 Recommendation

gemini-2.5-pro

Strong on VADER Leaderboard mean_score_pct (81%) and BaxBench Leaderboard average_secure_pass_1_pct (44%)

external/google/gemini-2-5-pro

21.2%

Score

32.1%

Confidence

23

Evidence

Ranked Models

30

Evidence Quality

80%

Scoring

Benchmark-backed

Top Signal

VADER Leaderboard: mean_score_pct

All Ranked Models

Max params:
Min confidence:
30 of 30
RankModelScore
#1gemini-2.5-pro

Strong on VADER Leaderboard mean_score_pct (81%) and BaxBench Leaderboard average_secure_pass_1_pct (44%)

21.2%
#4Meta-Llama-3-8B-Instruct
16.0%
#5gpt-4o-2024-05-13
15.8%
#6Llama-2-7b-chat-hf
15.1%
#8openai/gpt-4o-mini-2024-07-18
13.1%
#9deepseek/deepseek-r1
13.1%
#10gpt-4.1-20250414
12.5%
#11Kimi K2 Thinking
12.2%
#12gemma-7b-it
12.2%
#13gemma-2b-it
12.2%
#15z-ai/glm-4.7
11.9%
#17falcon-7b-instruct
11.3%
#19minimax/minimax-m2.1
11.2%
#21gemini-3-pro-preview
10.8%
#23zephyr-7b-beta
10.4%
#25GLM-5
10.4%
#28Grok-4-0709
10.2%
#29claude-sonnet-4-20250514
9.8%
#30google/gemini-3.1-pro-preview
9.8%
#32gpt-5-2025-08-07
9.0%
#33openai/gpt-5.4-2026-03-05
8.9%
#34gpt-4o
8.8%
#35gpt-5.1-2025-11-13
8.6%
#36anthropic/claude-sonnet-4.6
8.6%
#37claude-opus-4-5-20251101
8.5%
#38gpt-5-mini-2025-08-07
8.3%
#39gemini-3-flash-preview
8.1%
#40alpaca-native
8.1%
#41x-ai/grok-3
8.0%
#42Mistral-7B-OpenOrca
8.0%

Head-to-Head: #1 vs #2

#1

Top Pick

gemini-2.5-pro

Strong on VADER Leaderboard mean_score_pct (81%) and BaxBench Leaderboard average_secure_pass_1_pct (44%)

21.2%

Conf 32.1%

#4

Meta-Llama-3-8B-Instruct

Strong on LLM Trustworthy Leaderboard adv (100%) and LLM Trustworthy Leaderboard privacy (69%)

16.0%

Conf 24.2%

Related Lookups