developer_tools
deepseek/deepseek-r1 vs z-ai/glm-4.7
Model A winsby +0.9%
Rank #2
Confidence
26.4%
Evidence
16 pts
Aider Polyglot Leaderboard: percent_correct_pct
Value 80.0% · Conf 100.0% · Weight 2.7%
aider_polyglot.percent_correct_pct (Mar 12, 2026)
Sonar Java Quality Leaderboard: functional_skill_pct
Value 82.8% · Conf 100.0% · Weight 2.2%
sonar_java_quality.functional_skill_pct (Mar 12, 2026)
Aider Polyglot Leaderboard: correct_edit_format_pct
Value 90.4% · Conf 100.0% · Weight 1.3%
aider_polyglot.correct_edit_format_pct (Mar 12, 2026)
Sonar Java Quality Leaderboard: issue_density_error_per_kloc
Value 59.0% · Conf 100.0% · Weight 0.9%
sonar_java_quality.issue_density_error_per_kloc (Mar 12, 2026)
Sonar Java Quality Leaderboard: vulnerability_density_error_per_kloc
Value 49.1% · Conf 100.0% · Weight 0.5%
sonar_java_quality.vulnerability_density_error_per_kloc (Mar 12, 2026)
Rank #3
Confidence
25.8%
Evidence
15 pts
Sonar Java Quality Leaderboard: functional_skill_pct
Value 74.4% · Conf 100.0% · Weight 2.0%
sonar_java_quality.functional_skill_pct (Mar 12, 2026)
Vals LiveCodeBench: overall_accuracy_pct
Value 91.4% · Conf 100.0% · Weight 1.3%
vals_lcb.overall_accuracy_pct (Mar 12, 2026)
Vals SWE-bench: overall_accuracy_pct
Value 79.9% · Conf 100.0% · Weight 1.3%
vals_swebench.overall_accuracy_pct (Mar 12, 2026)
Sonar Java Quality Leaderboard: issue_density_error_per_kloc
Value 65.2% · Conf 100.0% · Weight 1.0%
sonar_java_quality.issue_density_error_per_kloc (Mar 12, 2026)
Vals Terminal-Bench 2: overall_accuracy_pct
Value 55.2% · Conf 100.0% · Weight 0.8%
vals_terminal_bench_2.overall_accuracy_pct (Mar 12, 2026)