developer_tools
anthropic/claude-sonnet-4.6 vs Kimi K2 Thinking
For Code generation
Model A winsby +3.4%
Rank #7
Confidence
33.8%
Evidence
26 pts
OpenHands Issue Resolution: issue_resolution_score_pct
Value 71.8% · Conf 100.0% · Weight 2.4%
openhands_issue_resolution.issue_resolution_score_pct (Mar 12, 2026)
OpenHands Index: issue_resolution_score_pct
Value 71.8% · Conf 100.0% · Weight 2.0%
openhands_index.issue_resolution_score_pct (Mar 12, 2026)
OpenHands Index: greenfield_score_pct
Value 75.2% · Conf 100.0% · Weight 1.4%
openhands_index.greenfield_score_pct (Mar 12, 2026)
Vals SWE-bench: overall_accuracy_pct
Value 95.1% · Conf 100.0% · Weight 1.3%
vals_swebench.overall_accuracy_pct (Mar 12, 2026)
Vals LiveCodeBench: overall_accuracy_pct
Value 91.2% · Conf 100.0% · Weight 1.1%
vals_lcb.overall_accuracy_pct (Mar 12, 2026)
Rank #10
Confidence
43.5%
Evidence
26 pts
Sonar Java Quality Leaderboard: functional_skill_pct
Value 88.4% · Conf 100.0% · Weight 2.8%
sonar_java_quality.functional_skill_pct (Mar 12, 2026)
Sonar Java Quality Leaderboard: issue_density_error_per_kloc
Value 66.6% · Conf 100.0% · Weight 1.5%
sonar_java_quality.issue_density_error_per_kloc (Mar 12, 2026)
Sonar Java Quality Leaderboard: vulnerability_density_error_per_kloc
Value 61.4% · Conf 100.0% · Weight 1.0%
sonar_java_quality.vulnerability_density_error_per_kloc (Mar 12, 2026)
Vals SWE-bench: overall_accuracy_pct
Value 63.5% · Conf 100.0% · Weight 0.9%
vals_swebench.overall_accuracy_pct (Mar 12, 2026)
Vals LiveCodeBench: overall_accuracy_pct
Value 65.1% · Conf 100.0% · Weight 0.8%
vals_lcb.overall_accuracy_pct (Mar 12, 2026)