BasedAGIBasedAGI
Menu
Rankings live

data_analytics

Text-to-SQL analyst assistant

Convert questions into SQL and explain the query.

#1 Recommendation

gemini-3-pro-preview

Strong on FACTS Benchmark Suite facts_grounding_score_pct (88%) and FACTS Benchmark Suite facts_search_score_pct (100%)

external/google/gemini-3-pro-preview

19.9%

Score

26.0%

Confidence

Limited benchmark evidence for this use case.

63 ranked models with average evidence of 12.4 points. Rankings may shift as more benchmark data is ingested.

Ranked Models

30

Evidence Quality

80%

Scoring

Benchmark-backed

Top Signal

FACTS Benchmark Suite: facts_grounding_score_pct

All Ranked Models

Max params:
Min confidence:
30 of 30
RankModelScore
#1gemini-3-pro-preview

Strong on FACTS Benchmark Suite facts_grounding_score_pct (88%) and FACTS Benchmark Suite facts_search_score_pct (100%)

19.9%
#2gpt-4o-20241120

Strong on DuckDB NSQL Leaderboard all_execution_accuracy (96%) and DuckDB NSQL Leaderboard hard_execution_accuracy (75%)

19.5%
#4gpt-4o
18.2%
#5qwen-2.5-72b-instruct
17.9%
#6gemini-2.5-pro
17.4%
#7deepseek/deepseek-r1
16.4%
#8anthropic/claude-sonnet-4.6
15.0%
#9gpt-5-mini-2025-08-07
14.7%
#10gpt-5-2025-08-07
14.4%
#13Grok-4-0709
14.2%
#15gpt-4o-2024-08-06
14.1%
#16google/gemini-3.1-pro-preview
13.8%
#17openai/gpt-5.4-2026-03-05
13.5%
#18gpt-4.1-20250414
13.4%
#21claude-opus-4-5-20251101
13.1%
#22openai/gpt-4o-mini-2024-07-18
13.0%
#26gpt-5.1-2025-11-13
12.0%
#27claude-sonnet-4-20250514
11.9%
#29gemini-3-flash-preview
11.7%
#31google/gemini-3.1-flash-lite-preview
11.2%
#32xai-org/grok-4-fast-reasoning
11.0%
#34Qwen3-32B
10.8%
#35phi-4
10.7%
#36gemini-2.5-flash
10.7%
#39xai-org/grok-4-1-fast-reasoning
10.5%
#40anthropic/claude-opus-4-6-thinking
10.5%
#41gpt-5.2-2025-12-11
10.5%
#43kimi/kimi-k2.5-thinking
9.9%
#46anthropic/claude-opus-4-5-20251101-thinking
9.7%
#47google/gemini-2.0-flash-001
9.6%

Compare Models

Model A leads by +0.4%

Shareable Link →

Model A

gemini-3-pro-preview

external/google/gemini-3-pro-preview

19.9%

Rank #1

Confidence 26.0%23 evidence pts

FACTS Benchmark Suite: facts_grounding_score_pct

Value 88.3% · Conf 100.0% · Weight 2.0%

facts_benchmark_suite.facts_grounding_score_pct (Mar 12, 2026)

FACTS Benchmark Suite: facts_search_score_pct

Value 100.0% · Conf 100.0% · Weight 1.7%

facts_benchmark_suite.facts_search_score_pct (Mar 12, 2026)

FACTS Benchmark Suite: average_score_pct

Value 100.0% · Conf 100.0% · Weight 1.6%

facts_benchmark_suite.average_score_pct (Mar 12, 2026)

Vals Finance Agent: overall_accuracy_pct

Value 87.0% · Conf 100.0% · Weight 1.6%

vals_finance_agent.overall_accuracy_pct (Mar 12, 2026)

Model B

gpt-4o-20241120

external/openai/gpt-4o-20241120

19.5%

Rank #2

Confidence 36.6%16 evidence pts

DuckDB NSQL Leaderboard: all_execution_accuracy

Value 96.2% · Conf 100.0% · Weight 7.0%

duckdb_nsql_leaderboard.all_execution_accuracy (Mar 12, 2026)

DuckDB NSQL Leaderboard: hard_execution_accuracy

Value 75.0% · Conf 100.0% · Weight 3.6%

duckdb_nsql_leaderboard.hard_execution_accuracy (Mar 12, 2026)

BIRD-CRITIC: success_rate_open_pct

Value 55.6% · Conf 100.0% · Weight 2.0%

bird_critic.success_rate_open_pct (Mar 12, 2026)

Spider2.0 Snow Text-to-SQL: snow_text_to_sql_score_pct

Value 13.5% · Conf 100.0% · Weight 0.7%

spider2_snow_text_to_sql.snow_text_to_sql_score_pct (Mar 12, 2026)

Ranking Diagnostics & Missing Models

Source Lift

Ranked

63

Sources

8

Quality

Insufficient

Vals CorpFin v2

vals_corp_fin_v2

40 rows

1.2% avg lift

Vals Legal Bench

vals_legal_bench

38 rows

0.3% avg lift

Vals Tax Eval v2

vals_tax_eval_v2

35 rows

0.3% avg lift

Vals MedQA

vals_medqa

34 rows

0.3% avg lift

Missing Strong Models

gpt-4.1-mini-20250414

external/openai/gpt-4-1-mini-20250414

Rank #31

13.1%

Thin evidence after weighting

gpt-4o-2024-05-13

external/openai/gpt-4o-2024-05-13

Rank #51

10.5%

Thin evidence after weighting

GPT-4.1-nano-2025-04-14

external/openai/gpt-4-1-nano-2025-04-14

Rank #89

6.4%

Thin evidence after weighting
Taxonomy Details

Core Tasks

task.text_to_sqltask.sql_debugging

Required Modes

mode.json_schema

Domains

domain.data_analytics_bi

Related Use Cases