BasedAGIBasedAGI
Menu
Rankings live

developer_tools

Best LLM for Code Generation

Benchmark-backed ranking of models for generating correct, secure code from requirements.

#1 Recommendation

anthropic/claude-sonnet-4.6

Strong on OpenHands Issue Resolution issue_resolution_score_pct (72%) and OpenHands Index issue_resolution_score_pct (72%)

external/anthropic/claude-sonnet-4-6

19.7%

Score

33.8%

Confidence

26

Evidence

Ranked Models

18

Evidence Quality

82%

Scoring

Benchmark-backed

Top Signal

OpenHands Issue Resolution: issue_resolution_score_pct

All Ranked Models

Max params:
Min confidence:
18 of 18
RankModelScore
#7anthropic/claude-sonnet-4.6
19.7%
#10Kimi K2 Thinking
16.3%
#11minimax/minimax-m2.1
15.8%
#12kimi/kimi-k2.5-thinking
14.1%
#13deepseek/deepseek-r1
14.1%
#14z-ai/glm-4.7
13.6%
#17gemini-3-pro-preview
11.3%
#18GLM-5
11.1%
#24gpt-4.1-20250414
9.7%
#25gpt-4o-2024-08-06
9.3%
#26gpt-4o
9.1%
#29Grok-4-0709
9.0%
#31gpt-4o-2024-05-13
8.9%
#35claude-sonnet-4-20250514
8.7%
#43gpt-4o-20241120
8.2%
#50GLM-4.7
7.6%
#58gemini-2.5-pro
6.7%
#76openai/gpt-4o-mini-2024-07-18
3.1%

Head-to-Head: #1 vs #2

#7

Top Pick

anthropic/claude-sonnet-4.6

Strong on OpenHands Issue Resolution issue_resolution_score_pct (72%) and OpenHands Index issue_resolution_score_pct (72%)

19.7%

Conf 33.8%

#10

Kimi K2 Thinking

Strong on Sonar Java Quality Leaderboard functional_skill_pct (88%) and Sonar Java Quality Leaderboard issue_density_error_per_kloc (67%)

16.3%

Conf 43.5%

Related Lookups