BasedAGIBasedAGI
Menu
Rankings live

developer_tools

Best LLM for Autonomous Coding

Benchmark-backed ranking of models for end-to-end autonomous software engineering and issue resolution.

#1 Recommendation

Kimi K2 Thinking

Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct (80%) and Sonar Java Quality Leaderboard functional_skill_pct (88%)

external/kimi/kimi-k2-thinking

16.8%

Score

42.9%

Confidence

26

Evidence

Ranked Models

25

Evidence Quality

82%

Scoring

Benchmark-backed

Top Signal

SWE-bench Verified Leaderboard: swe_verified_resolved_pct

All Ranked Models

Max params:
Min confidence:
25 of 25
RankModelScore
#8Kimi K2 Thinking
16.8%
#9GLM-5
16.8%
#10anthropic/claude-sonnet-4.6
16.6%
#13gemini-3-pro-preview
15.2%
#15gemini-2.5-pro
14.3%
#16openai/gpt-4.1
14.1%
#17kimi/kimi-k2.5-thinking
13.9%
#18gpt-4.1-20250414
13.5%
#19claude-opus-4-5-20251101
13.4%
#21gpt-5.2-2025-12-11
12.8%
#24minimax/minimax-m2.1
11.2%
#25gpt-4o
11.1%
#26deepseek/deepseek-r1
10.6%
#28o3-20250416
10.1%
#30Grok-4-0709
9.2%
#31claude-sonnet-4-20250514
9.1%
#32gpt-4.1-mini-20250414
8.9%
#33gpt-4o-20241120
8.8%
#34z-ai/glm-4.7
8.7%
#35Kimi-K2-Instruct
8.6%
#36gpt-4o-2024-05-13
8.4%
#37gpt-4o-2024-08-06
8.2%
#39o4-mini-20250416
7.6%
#40GLM-4.7
7.1%
#48openai/gpt-4o-mini-2024-07-18
2.6%

Head-to-Head: #1 vs #2

#8

Top Pick

Kimi K2 Thinking

Strong on SWE-bench Verified Leaderboard swe_verified_resolved_pct (80%) and Sonar Java Quality Leaderboard functional_skill_pct (88%)

16.8%

Conf 42.9%

#9

GLM-5

Strong on OpenHands Issue Resolution issue_resolution_score_pct (59%) and Sonar Java Quality Leaderboard functional_skill_pct (92%)

16.8%

Conf 29.8%

Related Lookups