data_analytics
qwen-2.5-72b-instruct vs gpt-4o
Model A winsby +1.4%
Rank #1
Confidence
31.9%
Evidence
12 pts
DuckDB NSQL Leaderboard: all_execution_accuracy
Value 82.7% · Conf 100.0% · Weight 5.9%
duckdb_nsql_leaderboard.all_execution_accuracy (Mar 12, 2026)
JSONSchemaBench Leaderboard: medium_schema_compliance_pct
Value 90.1% · Conf 100.0% · Weight 3.4%
jsonschemabench_leaderboard.medium_schema_compliance_pct (Mar 12, 2026)
JSONSchemaBench Leaderboard: hard_schema_compliance_pct
Value 74.4% · Conf 100.0% · Weight 1.9%
jsonschemabench_leaderboard.hard_schema_compliance_pct (Mar 12, 2026)
Galileo Agent Leaderboard v2: Avg AC
Value 76.1% · Conf 100.0% · Weight 1.2%
galileo_agent_v2.avg_ac (Mar 12, 2026)
LLM-AggreFact Leaderboard: average_score_pct
Value 40.0% · Conf 100.0% · Weight 0.9%
llm_aggrefact_leaderboard.average_score_pct (Mar 12, 2026)
Rank #2
Confidence
32.1%
Evidence
14 pts
DuckDB NSQL Leaderboard: all_execution_accuracy
Value 76.9% · Conf 100.0% · Weight 5.5%
duckdb_nsql_leaderboard.all_execution_accuracy (Mar 12, 2026)
JSONSchemaBench Leaderboard: medium_schema_compliance_pct
Value 100.0% · Conf 100.0% · Weight 3.7%
jsonschemabench_leaderboard.medium_schema_compliance_pct (Mar 12, 2026)
JSONSchemaBench Leaderboard: hard_schema_compliance_pct
Value 100.0% · Conf 100.0% · Weight 2.5%
jsonschemabench_leaderboard.hard_schema_compliance_pct (Mar 12, 2026)
DuckDB NSQL Leaderboard: hard_execution_accuracy
Value 50.0% · Conf 100.0% · Weight 1.1%
duckdb_nsql_leaderboard.hard_execution_accuracy (Mar 12, 2026)
MEGA-Bench: overall_score
Value 92.8% · Conf 100.0% · Weight 0.6%
mega_bench.overall_score (Mar 12, 2026)