BasedAGIBasedAGI
Menu
Rankings live

data_analytics

Data quality assistant

Propose validation checks and likely data issues from schema and symptoms.

#1 Recommendation

gpt-4o

Strong on DuckDB NSQL Leaderboard all_execution_accuracy (77%) and JSONSchemaBench Leaderboard medium_schema_compliance_pct (100%)

external/openai/gpt-4o

21.1%

Score

35.9%

Confidence

Limited benchmark evidence for this use case.

59 ranked models with average evidence of 11.4 points. Rankings may shift as more benchmark data is ingested.

Ranked Models

30

Evidence Quality

79%

Scoring

Benchmark-backed

Top Signal

DuckDB NSQL Leaderboard: all_execution_accuracy

All Ranked Models

Max params:
Min confidence:
30 of 30
RankModelScore
#1gpt-4o

Strong on DuckDB NSQL Leaderboard all_execution_accuracy (77%) and JSONSchemaBench Leaderboard medium_schema_compliance_pct (100%)

21.1%
#2qwen-2.5-72b-instruct

Strong on DuckDB NSQL Leaderboard all_execution_accuracy (83%) and JSONSchemaBench Leaderboard medium_schema_compliance_pct (90%)

20.9%
#3gpt-4o-20241120

Strong on DuckDB NSQL Leaderboard all_execution_accuracy (96%) and DuckDB NSQL Leaderboard hard_execution_accuracy (75%)

20.3%
#5deepseek/deepseek-r1
18.4%
#7openai/gpt-4o-mini-2024-07-18
15.9%
#10gpt-4o-2024-08-06
15.0%
#14gemini-3-pro-preview
13.0%
#15google/gemini-2.0-flash-001
12.8%
#16gpt-4.1-20250414
12.6%
#19gemini-2.5-pro
12.3%
#20Grok-4-0709
12.3%
#23google/gemini-3.1-pro-preview
11.9%
#24claude-sonnet-4-20250514
11.9%
#27Llama-3.3-70B-Instruct
11.5%
#30gpt-5-2025-08-07
10.9%
#32openai/gpt-5.4-2026-03-05
10.7%
#34gemma-2-27b-it
10.6%
#35gpt-5.1-2025-11-13
10.4%
#38gemini-2.5-flash
10.4%
#40anthropic/claude-sonnet-4.6
10.3%
#41phi-4
10.3%
#42claude-opus-4-5-20251101
10.3%
#43gpt-5-mini-2025-08-07
10.0%
#44Qwen3-30B-A3B
9.9%
#45anthropic/claude-opus-4-6-thinking
9.8%
#46gemini-3-flash-preview
9.8%
#47Qwen2.5-Coder-7B
9.7%
#48gpt-5.2-2025-12-11
9.6%
#49anthropic/claude-opus-4-5-20251101-thinking
9.5%
#53Qwen3-32B
9.4%

Compare Models

Model A leads by +0.3%

Shareable Link →

Model A

gpt-4o

external/openai/gpt-4o

21.1%

Rank #1

Confidence 35.9%14 evidence pts

DuckDB NSQL Leaderboard: all_execution_accuracy

Value 76.9% · Conf 100.0% · Weight 6.2%

duckdb_nsql_leaderboard.all_execution_accuracy (Mar 12, 2026)

JSONSchemaBench Leaderboard: medium_schema_compliance_pct

Value 100.0% · Conf 100.0% · Weight 3.0%

jsonschemabench_leaderboard.medium_schema_compliance_pct (Mar 12, 2026)

JSONSchemaBench Leaderboard: hard_schema_compliance_pct

Value 100.0% · Conf 100.0% · Weight 2.1%

jsonschemabench_leaderboard.hard_schema_compliance_pct (Mar 12, 2026)

DuckDB NSQL Leaderboard: hard_execution_accuracy

Value 50.0% · Conf 100.0% · Weight 1.5%

duckdb_nsql_leaderboard.hard_execution_accuracy (Mar 12, 2026)

Model B

qwen-2.5-72b-instruct

external/qwen/qwen-2-5-72b-instruct

20.9%

Rank #2

Confidence 29.5%11 evidence pts

DuckDB NSQL Leaderboard: all_execution_accuracy

Value 82.7% · Conf 100.0% · Weight 6.6%

duckdb_nsql_leaderboard.all_execution_accuracy (Mar 12, 2026)

JSONSchemaBench Leaderboard: medium_schema_compliance_pct

Value 90.1% · Conf 100.0% · Weight 2.7%

jsonschemabench_leaderboard.medium_schema_compliance_pct (Mar 12, 2026)

Galileo Agent Leaderboard v2: Avg AC

Value 76.1% · Conf 100.0% · Weight 1.6%

galileo_agent_v2.avg_ac (Mar 12, 2026)

JSONSchemaBench Leaderboard: hard_schema_compliance_pct

Value 74.4% · Conf 100.0% · Weight 1.6%

jsonschemabench_leaderboard.hard_schema_compliance_pct (Mar 12, 2026)

Ranking Diagnostics & Missing Models

Source Lift

Ranked

59

Sources

8

Quality

Insufficient

Vals Legal Bench

vals_legal_bench

37 rows

0.6% avg lift

Vals Tax Eval v2

vals_tax_eval_v2

37 rows

0.5% avg lift

Vals LiveCodeBench

vals_lcb

36 rows

0.5% avg lift

Vals CorpFin v2

vals_corp_fin_v2

36 rows

0.5% avg lift

Missing Strong Models

zai/glm-5-thinking

external/zai/glm-5-thinking

Rank #32

13.0%

Thin evidence after weighting

x-ai/grok-3

external/x-ai/grok-3

Rank #35

12.1%

Thin evidence after weighting

gpt-4o-2024-05-13

external/openai/gpt-4o-2024-05-13

Rank #51

10.5%

Thin evidence after weighting

qwen/qwen3-max

external/qwen/qwen3-max

Rank #55

10.3%

Thin evidence after weighting
Taxonomy Details

Core Tasks

task.data_quality_checks_text

Required Modes

none

Domains

domain.data_analytics_bi

Related Use Cases