BasedAGIBasedAGI
Menu
Rankings live

developer_tools

Best LLM for Refactoring

Ranked models for safely refactoring code while preserving behavior and improving clarity.

#1 Recommendation

z-ai/glm-4.7

Strong on Sonar Java Quality Leaderboard functional_skill_pct (74%) and Vals LiveCodeBench overall_accuracy_pct (91%)

external/z-ai/glm-4-7

21.8%

Score

31.4%

Confidence

16

Evidence

Ranked Models

30

Evidence Quality

81%

Scoring

Benchmark-backed

Top Signal

Sonar Java Quality Leaderboard: functional_skill_pct

All Ranked Models

Max params:
Min confidence:
30 of 30
RankModelScore
#1z-ai/glm-4.7

Strong on Sonar Java Quality Leaderboard functional_skill_pct (74%) and Vals LiveCodeBench overall_accuracy_pct (91%)

21.8%
#2Kimi K2 Thinking

Strong on Sonar Java Quality Leaderboard functional_skill_pct (88%) and Sonar Java Quality Leaderboard issue_density_error_per_kloc (67%)

21.3%
#4minimax/minimax-m2.1
20.3%
#6deepseek/deepseek-r1
18.2%
#7gemini-3-pro-preview
17.9%
#8google/gemini-3.1-pro-preview
17.5%
#9Grok-4-0709
17.2%
#10gpt-4o-2024-05-13
17.1%
#11gpt-4.1-20250414
16.6%
#12openai/gpt-5.4-2026-03-05
16.4%
#13claude-sonnet-4-20250514
16.1%
#14anthropic/claude-sonnet-4.6
16.0%
#16anthropic/claude-opus-4-6-thinking
15.8%
#17claude-opus-4-5-20251101
15.6%
#18GLM-5
15.5%
#19gemini-3-flash-preview
15.4%
#20gpt-5-2025-08-07
15.3%
#22gpt-5.2-2025-12-11
15.2%
#23gpt-5.1-2025-11-13
15.2%
#24anthropic/claude-opus-4-5-20251101-thinking
15.1%
#25gpt-4o
14.4%
#27kimi/kimi-k2.5-thinking
13.7%
#29anthropic/claude-sonnet-4-5-20250929-thinking
13.3%
#31gpt-4o-2024-08-06
12.8%
#32gemini-2.5-pro
12.8%
#35zai/glm-5-thinking
12.3%
#36qwen-2.5-72b-instruct
12.3%
#40xai-org/grok-4-fast-reasoning
12.0%
#43google/gemini-3.1-flash-lite-preview
11.9%
#45gpt-4o-20241120
11.9%

Head-to-Head: #1 vs #2

#1

Top Pick

z-ai/glm-4.7

Strong on Sonar Java Quality Leaderboard functional_skill_pct (74%) and Vals LiveCodeBench overall_accuracy_pct (91%)

21.8%

Conf 31.4%

#2

Kimi K2 Thinking

Strong on Sonar Java Quality Leaderboard functional_skill_pct (88%) and Sonar Java Quality Leaderboard issue_density_error_per_kloc (67%)

21.3%

Conf 30.7%

Related Lookups