BasedAGIBasedAGI
Menu
Rankings live

developer_tools

Best LLM for Unit Test Generation

Ranked models for generating meaningful unit tests and edge cases from code.

#1 Recommendation

gpt-4o-2024-05-13

Strong on RepoQA Official Results overall_average_pass_at_1_pct (99%) and RepoQA Official Results all_average_pass_at_1_pct (99%)

external/openai/gpt-4o-2024-05-13

19.2%

Score

23.9%

Confidence

9

Evidence

Ranked Models

30

Evidence Quality

80%

Scoring

Benchmark-backed

Top Signal

RepoQA Official Results: overall_average_pass_at_1_pct

All Ranked Models

Max params:
Min confidence:
30 of 30
RankModelScore
#2gpt-4o-2024-05-13

Strong on RepoQA Official Results overall_average_pass_at_1_pct (99%) and RepoQA Official Results all_average_pass_at_1_pct (99%)

19.2%
#5gpt-4.1-20250414
17.4%
#6gemini-3-pro-preview
16.7%
#7google/gemini-3.1-pro-preview
16.3%
#8Grok-4-0709
16.1%
#10openai/gpt-5.4-2026-03-05
15.3%
#12claude-sonnet-4-20250514
15.1%
#13anthropic/claude-sonnet-4.6
14.9%
#14z-ai/glm-4.7
14.9%
#15anthropic/claude-opus-4-6-thinking
14.7%
#17claude-opus-4-5-20251101
14.5%
#18gemini-3-flash-preview
14.3%
#19gpt-5-2025-08-07
14.3%
#20minimax/minimax-m2.1
14.2%
#21gpt-5.2-2025-12-11
14.2%
#22gpt-5.1-2025-11-13
14.2%
#23anthropic/claude-opus-4-5-20251101-thinking
14.1%
#24Kimi K2 Thinking
14.0%
#27Meta-Llama-3-70B-Instruct
12.8%
#28kimi/kimi-k2.5-thinking
12.8%
#29gpt-4o-20241120
12.7%
#31anthropic/claude-sonnet-4-5-20250929-thinking
12.4%
#34gpt-4o-2024-08-06
12.0%
#35gemini-2.5-pro
12.0%
#36deepseek/deepseek-r1
11.6%
#38zai/glm-5-thinking
11.5%
#39qwen-2.5-72b-instruct
11.5%
#40xai-org/grok-4-fast-reasoning
11.2%
#41google/gemini-3.1-flash-lite-preview
11.1%
#43Meta-Llama-3-8B-Instruct
11.1%

Head-to-Head: #1 vs #2

#2

Top Pick

gpt-4o-2024-05-13

Strong on RepoQA Official Results overall_average_pass_at_1_pct (99%) and RepoQA Official Results all_average_pass_at_1_pct (99%)

19.2%

Conf 23.9%

#5

gpt-4.1-20250414

Strong on Galileo Agent Leaderboard v2 Avg AC (100%) and MMLongBench-Doc Leaderboard acc_score_pct (75%)

17.4%

Conf 26.4%

Related Lookups