data_analytics
Best Text-to-SQL Model
Ranked text-to-SQL models for converting natural language questions into accurate SQL queries.
Full Analysis Available
Benchmark methodology, patterns in the data, and deployment notes
#1 Recommendation
gpt-5-2025-08-07
Strong on Spider2.0 Snow Text-to-SQL snow_text_to_sql_score_pct and LiveSQLBench success_rate_pct
external/openai/gpt-5-2025-08-07
21.5%
Score
29.3%
Confidence
29
Evidence
Ranked Models
30
Evidence Quality
93%
Evidence Points
29
Top Signal
Spider2.0 Snow Text-to-SQL: snow_text_to_sql_score_pct
Benchmark Sources
43
Last Updated
21h ago
All Ranked Models
| Rank | Model | Score |
|---|---|---|
| 🥇 | gpt-5-2025-08-07 Strong on Spider2.0 Snow Text-to-SQL snow_text_to_sql_score_pct and LiveSQLBench success_rate_pct | 21.5% |
| 🥈 | gpt-4o Strong on DuckDB NSQL Leaderboard all_execution_accuracy and JSONSchemaBench Leaderboard medium_schema_compliance_pct | 20.6% |
| 🥉 | qwen-2.5-72b-instruct Strong on DuckDB NSQL Leaderboard all_execution_accuracy and JSONSchemaBench Leaderboard medium_schema_compliance_pct | 20.3% |
| #4 | o3-20250416 Strong on Spider2.0 Snow Text-to-SQL snow_text_to_sql_score_pct and LiveSQLBench success_rate_pct | 19.0% |
| #5 | deepseek-r1 Strong on DuckDB NSQL Leaderboard all_execution_accuracy and DuckDB NSQL Leaderboard hard_execution_accuracy | 18.1% |
| #6 | claude-sonnet-4 Strong on LiveSQLBench success_rate_pct and Spider2.0 Lite Text-to-SQL lite_text_to_sql_score_pct | 17.2% |
| #9 | gemini-3.1-pro-preview Strong on FACTS Benchmark Suite facts_search_score_pct and FACTS Benchmark Suite facts_grounding_score_pct | 16.2% |
| #11 | gemini-2.5-pro Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct | 15.0% |
| #13 | Grok-4-0709 Strong on Vals CorpFin v2 overall_accuracy_pct and Berkeley Function Calling Leaderboard (Overall) Overall Acc | 14.5% |
| #14 | gpt-5.2-2025-12-11 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals CorpFin v2 overall_accuracy_pct | 14.4% |
| #15 | qwen-2.5-coder7b-instruct Strong on DuckDB NSQL Leaderboard all_execution_accuracy and JSONSchemaBench Leaderboard medium_schema_compliance_pct | 14.1% |
| #16 | phi-4 Strong on DuckDB NSQL Leaderboard all_execution_accuracy and Open LLM Leaderboard GPQA gpqa | 14.1% |
| #17 | gpt-5-mini-2025-08-07 Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct | 14.1% |
| #18 | gemini-3-pro-preview Strong on Berkeley Function Calling Leaderboard (Overall) Overall Acc and Vals CorpFin v2 overall_accuracy_pct | 14.1% |
| #19 | gemini-3-flash-preview Strong on Vals CorpFin v2 overall_accuracy_pct and FACTS Benchmark Suite facts_grounding_score_pct | 14.0% |
| #20 | gpt-4.1-20250414 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Vals CorpFin v2 overall_accuracy_pct | 12.9% |
| #21 | claude-sonnet-4.6 Strong on Vals Finance Agent overall_accuracy_pct and Vals CorpFin v2 overall_accuracy_pct | 12.5% |
| #22 | gemini-3.1-flash-lite-preview Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct | 12.5% |
| #23 | claude-opus-4-5-20251101 Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vals CorpFin v2 overall_accuracy_pct | 12.2% |
| #24 | gpt-4o-2024-08-06 Strong on DuckDB NSQL Leaderboard all_execution_accuracy and Vectara HHEM Leaderboard overall_hallucination_error_pct | 12.1% |
| #25 | Llama-3.3-70B-Instruct Strong on DuckDB NSQL Leaderboard all_execution_accuracy and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct | 12.0% |
| #26 | o4-mini Strong on LiveSQLBench success_rate_pct and Vals CorpFin v2 overall_accuracy_pct | 11.5% |
| #27 | grok-4-1-fast-reasoning Strong on Berkeley Function Calling Leaderboard (Overall) Overall Acc and Vals CorpFin v2 overall_accuracy_pct | 11.5% |
| #28 | gemini-2.5-flash Strong on FACTS Benchmark Suite facts_grounding_score_pct and Vectara HHEM Leaderboard overall_hallucination_error_pct | 11.4% |
| #30 | gemma-2-27b-it Strong on DuckDB NSQL Leaderboard all_execution_accuracy and Open LLM Leaderboard GPQA gpqa | 11.3% |
| #32 | gpt-4o-mini-2024-07-18 Strong on DuckDB NSQL Leaderboard all_execution_accuracy and DuckDB NSQL Leaderboard hard_execution_accuracy | 11.2% |
| #33 | qwen-2.5-coder32b-instruct Strong on DuckDB NSQL Leaderboard all_execution_accuracy and Open LLM Leaderboard MMLU-Pro mmlu_pro_accuracy_pct | 11.2% |
| #34 | gpt-5.4-2026-03-05 Strong on Vectara HHEM Leaderboard overall_hallucination_error_pct and Vals CorpFin v2 overall_accuracy_pct | 11.1% |
| #39 | Phi-3-medium-128k-instruct Strong on DuckDB NSQL Leaderboard all_execution_accuracy and DuckDB NSQL Leaderboard hard_execution_accuracy | 10.3% |
| #41 | grok-4-fast-reasoning Strong on Vals CorpFin v2 overall_accuracy_pct and Vals Finance Agent overall_accuracy_pct | 10.2% |
Head-to-Head: #1 vs #2
#1
Top Pickgpt-5-2025-08-07
Strong on Spider2.0 Snow Text-to-SQL snow_text_to_sql_score_pct and LiveSQLBench success_rate_pct
Conf 29.3%
#2
gpt-4o
Strong on DuckDB NSQL Leaderboard all_execution_accuracy and JSONSchemaBench Leaderboard medium_schema_compliance_pct
Conf 38.0%
Related Lookups
Best LLM for Code Generation
Benchmark-backed ranking of models for generating correct, secure code from requirements.
Best LLM for Debugging
Find the top-ranked models for localizing bugs and proposing fixes with explanations.
Best LLM for Unit Test Generation
Ranked models for generating meaningful unit tests and edge cases from code.
Best LLM for Code Review
Compare models for automated PR review covering correctness, security, and maintainability.
Best LLM for Autonomous Coding
Benchmark-backed ranking of models for end-to-end autonomous software engineering and issue resolution.
Best LLM for Function Calling
Compare models for reliable tool use, function selection, and multi-step API orchestration.