developer_tools

Kimi K2 Thinking vs GLM-5

For Autonomous Coding Agent

Model A winsby +0.0%

Model A

Winner

Kimi K2 Thinking

external/kimi/kimi-k2-thinking

16.8%

Rank #9

Confidence

42.9%

Evidence

26 pts

Confidence 42.9%26 evidence pts

SWE-bench Verified Leaderboard: swe_verified_resolved_pct

Value 80.2% · Conf 100.0% · Weight 4.2%

swebench_verified_official.swe_verified_resolved_pct (Mar 17, 2026)

Sonar Java Quality Leaderboard: functional_skill_pct

Value 88.4% · Conf 100.0% · Weight 1.8%

sonar_java_quality.functional_skill_pct (Mar 17, 2026)

Vals SWE-bench: overall_accuracy_pct

Value 63.5% · Conf 100.0% · Weight 0.7%

vals_swebench.overall_accuracy_pct (Mar 17, 2026)

Sonar Java Quality Leaderboard: issue_density_error_per_kloc

Value 66.6% · Conf 100.0% · Weight 0.7%

sonar_java_quality.issue_density_error_per_kloc (Mar 17, 2026)

Vals LiveCodeBench: overall_accuracy_pct

Value 65.1% · Conf 100.0% · Weight 0.7%

vals_lcb.overall_accuracy_pct (Mar 17, 2026)

Model B

GLM-5

zai-org/GLM-5

16.8%

Rank #10

Confidence

29.8%

Evidence

17 pts

Confidence 29.8%17 evidence pts

OpenHands Issue Resolution: issue_resolution_score_pct

Value 59.0% · Conf 100.0% · Weight 2.4%

openhands_issue_resolution.issue_resolution_score_pct (Mar 17, 2026)

Sonar Java Quality Leaderboard: functional_skill_pct

Value 91.6% · Conf 100.0% · Weight 1.8%

sonar_java_quality.functional_skill_pct (Mar 17, 2026)

OpenHands Index: average_score_pct

Value 36.5% · Conf 100.0% · Weight 1.4%

openhands_index.average_score_pct (Mar 17, 2026)

Sonar Java Quality Leaderboard: issue_density_error_per_kloc

Value 100.0% · Conf 100.0% · Weight 1.1%

sonar_java_quality.issue_density_error_per_kloc (Mar 17, 2026)

OpenHands Index: information_gathering_score_pct

Value 70.0% · Conf 100.0% · Weight 0.8%

openhands_index.information_gathering_score_pct (Mar 17, 2026)

Back to Autonomous Coding Agent Kimi K2 Thinking Profile GLM-5 Profile