BasedAGIBasedAGI
Menu
Rankings live

data_analytics

Best Text-to-SQL Model

Ranked text-to-SQL models for converting natural language questions into accurate SQL queries.

#1 Recommendation

gemini-3-pro-preview

Strong on FACTS Benchmark Suite facts_grounding_score_pct (88%) and FACTS Benchmark Suite facts_search_score_pct (100%)

external/google/gemini-3-pro-preview

19.9%

Score

26.0%

Confidence

23

Evidence

Ranked Models

30

Evidence Quality

80%

Scoring

Benchmark-backed

Top Signal

FACTS Benchmark Suite: facts_grounding_score_pct

All Ranked Models

Max params:
Min confidence:
30 of 30
RankModelScore
#1gemini-3-pro-preview

Strong on FACTS Benchmark Suite facts_grounding_score_pct (88%) and FACTS Benchmark Suite facts_search_score_pct (100%)

19.9%
#2gpt-4o-20241120

Strong on DuckDB NSQL Leaderboard all_execution_accuracy (96%) and DuckDB NSQL Leaderboard hard_execution_accuracy (75%)

19.5%
#4gpt-4o
18.2%
#5qwen-2.5-72b-instruct
17.9%
#6gemini-2.5-pro
17.4%
#7deepseek/deepseek-r1
16.4%
#8anthropic/claude-sonnet-4.6
15.0%
#9gpt-5-mini-2025-08-07
14.7%
#10gpt-5-2025-08-07
14.4%
#13Grok-4-0709
14.2%
#15gpt-4o-2024-08-06
14.1%
#16google/gemini-3.1-pro-preview
13.7%
#17openai/gpt-5.4-2026-03-05
13.5%
#18gpt-4.1-20250414
13.4%
#21claude-opus-4-5-20251101
13.1%
#22openai/gpt-4o-mini-2024-07-18
13.0%
#26gpt-5.1-2025-11-13
12.0%
#27claude-sonnet-4-20250514
11.9%
#29gemini-3-flash-preview
11.7%
#31google/gemini-3.1-flash-lite-preview
11.2%
#32xai-org/grok-4-fast-reasoning
11.0%
#34Qwen3-32B
10.8%
#35phi-4
10.7%
#36gemini-2.5-flash
10.7%
#39xai-org/grok-4-1-fast-reasoning
10.5%
#40anthropic/claude-opus-4-6-thinking
10.5%
#41gpt-5.2-2025-12-11
10.5%
#43kimi/kimi-k2.5-thinking
9.9%
#46anthropic/claude-opus-4-5-20251101-thinking
9.7%
#47google/gemini-2.0-flash-001
9.6%

Head-to-Head: #1 vs #2

#1

Top Pick

gemini-3-pro-preview

Strong on FACTS Benchmark Suite facts_grounding_score_pct (88%) and FACTS Benchmark Suite facts_search_score_pct (100%)

19.9%

Conf 26.0%

#2

gpt-4o-20241120

Strong on DuckDB NSQL Leaderboard all_execution_accuracy (96%) and DuckDB NSQL Leaderboard hard_execution_accuracy (75%)

19.5%

Conf 36.6%

Related Lookups