data_analytics
gpt-4o-20241120 vs gpt-4o
For SQL debugging
Model A winsby +4.2%
Rank #1
Confidence
44.7%
Evidence
15 pts
DuckDB NSQL Leaderboard: all_execution_accuracy
Value 96.2% · Conf 100.0% · Weight 7.6%
duckdb_nsql_leaderboard.all_execution_accuracy (Mar 12, 2026)
DuckDB NSQL Leaderboard: hard_execution_accuracy
Value 75.0% · Conf 100.0% · Weight 4.3%
duckdb_nsql_leaderboard.hard_execution_accuracy (Mar 12, 2026)
BIRD-CRITIC: success_rate_open_pct
Value 55.6% · Conf 100.0% · Weight 2.5%
bird_critic.success_rate_open_pct (Mar 12, 2026)
MMLongBench-Doc Leaderboard: acc_score_pct
Value 62.7% · Conf 100.0% · Weight 1.1%
mmlongbench_doc_leaderboard.acc_score_pct (Mar 12, 2026)
Spider2.0 Snow Text-to-SQL: snow_text_to_sql_score_pct
Value 13.5% · Conf 100.0% · Weight 0.9%
spider2_snow_text_to_sql.snow_text_to_sql_score_pct (Mar 12, 2026)
Rank #3
Confidence
41.9%
Evidence
14 pts
DuckDB NSQL Leaderboard: all_execution_accuracy
Value 76.9% · Conf 100.0% · Weight 6.0%
duckdb_nsql_leaderboard.all_execution_accuracy (Mar 12, 2026)
JSONSchemaBench Leaderboard: medium_schema_compliance_pct
Value 100.0% · Conf 100.0% · Weight 2.9%
jsonschemabench_leaderboard.medium_schema_compliance_pct (Mar 12, 2026)
DuckDB NSQL Leaderboard: hard_execution_accuracy
Value 50.0% · Conf 100.0% · Weight 2.9%
duckdb_nsql_leaderboard.hard_execution_accuracy (Mar 12, 2026)
JSONSchemaBench Leaderboard: hard_schema_compliance_pct
Value 100.0% · Conf 100.0% · Weight 2.1%
jsonschemabench_leaderboard.hard_schema_compliance_pct (Mar 12, 2026)
MEGA-Bench: overall_score
Value 92.8% · Conf 100.0% · Weight 0.5%
mega_bench.overall_score (Mar 12, 2026)