developer_tools
Kimi K2 Thinking vs GLM-5
Model A winsby +0.0%
Rank #9
Confidence
42.9%
Evidence
26 pts
SWE-bench Verified Leaderboard: swe_verified_resolved_pct
Value 80.2% · Conf 100.0% · Weight 4.2%
swebench_verified_official.swe_verified_resolved_pct (Mar 17, 2026)
Sonar Java Quality Leaderboard: functional_skill_pct
Value 88.4% · Conf 100.0% · Weight 1.8%
sonar_java_quality.functional_skill_pct (Mar 17, 2026)
Vals SWE-bench: overall_accuracy_pct
Value 63.5% · Conf 100.0% · Weight 0.7%
vals_swebench.overall_accuracy_pct (Mar 17, 2026)
Sonar Java Quality Leaderboard: issue_density_error_per_kloc
Value 66.6% · Conf 100.0% · Weight 0.7%
sonar_java_quality.issue_density_error_per_kloc (Mar 17, 2026)
Vals LiveCodeBench: overall_accuracy_pct
Value 65.1% · Conf 100.0% · Weight 0.7%
vals_lcb.overall_accuracy_pct (Mar 17, 2026)
Rank #10
Confidence
29.8%
Evidence
17 pts
OpenHands Issue Resolution: issue_resolution_score_pct
Value 59.0% · Conf 100.0% · Weight 2.4%
openhands_issue_resolution.issue_resolution_score_pct (Mar 17, 2026)
Sonar Java Quality Leaderboard: functional_skill_pct
Value 91.6% · Conf 100.0% · Weight 1.8%
sonar_java_quality.functional_skill_pct (Mar 17, 2026)
OpenHands Index: average_score_pct
Value 36.5% · Conf 100.0% · Weight 1.4%
openhands_index.average_score_pct (Mar 17, 2026)
Sonar Java Quality Leaderboard: issue_density_error_per_kloc
Value 100.0% · Conf 100.0% · Weight 1.1%
sonar_java_quality.issue_density_error_per_kloc (Mar 17, 2026)
OpenHands Index: information_gathering_score_pct
Value 70.0% · Conf 100.0% · Weight 0.8%
openhands_index.information_gathering_score_pct (Mar 17, 2026)