education
Socratic tutor
Teach concepts by guiding with questions and stepwise hints.
#1 Recommendation
gpt-4.1-20250414
Strong on OpenVLM TextVQA Official textvqa_score_pct (77%) and OpenVLM OCRBench Official ocrbench_score_pct (88%)
external/openai/gpt-4-1-20250414
23.3%
Score
36.1%
Confidence
Limited benchmark evidence for this use case.
24 ranked models with average evidence of 13.3 points. Rankings may shift as more benchmark data is ingested.
Ranked Models
24
Evidence Quality
80%
Scoring
Benchmark-backed
Top Signal
OpenVLM TextVQA Official: textvqa_score_pct
All Ranked Models
| Rank | Model | Score |
|---|---|---|
| #1 | gpt-4.1-20250414 Strong on OpenVLM TextVQA Official textvqa_score_pct (77%) and OpenVLM OCRBench Official ocrbench_score_pct (88%) | 23.3% |
| #5 | gpt-4.1-mini-20250414 | 19.4% |
| #15 | gemini-2.5-flash | 16.2% |
| #30 | google/gemini-2.0-flash-001 | 14.3% |
| #50 | gemini-2.5-pro | 12.3% |
| #53 | gpt-5-2025-08-07 | 11.9% |
| #54 | google/gemini-3.1-pro-preview | 11.9% |
| #57 | Llama-3.1-70B-Instruct | 11.8% |
| #63 | Qwen-VL-Chat | 11.4% |
| #65 | gpt-5-mini-2025-08-07 | 11.3% |
| #83 | gpt-4o | 9.8% |
| #89 | gemini-3-pro-preview | 9.6% |
| #97 | Grok-4-0709 | 9.1% |
| #98 | Llama-3.3-70B-Instruct | 9.0% |
| #99 | GPT-4.1-nano-2025-04-14 | 9.0% |
| #113 | kimi/kimi-k2.5-thinking | 8.5% |
| #118 | claude-sonnet-4-20250514 | 8.3% |
| #141 | phi-4 | 6.7% |
| #147 | deepseek/deepseek-r1 | 6.0% |
| #148 | qwen-2.5-72b-instruct | 5.9% |
| #155 | Meta-Llama-3-8B-Instruct | 4.6% |
| #157 | openai/gpt-4o-mini-2024-07-18 | 4.4% |
| #160 | Phi-4-multimodal-instruct | 3.4% |
| #169 | Qwen3-30B-A3B | 0.9% |
Compare Models
Model A leads by +3.9%
Shareable Link →Model A
gpt-4.1-20250414
external/openai/gpt-4-1-20250414
Rank #1
OpenVLM TextVQA Official: textvqa_score_pct
Value 76.8% · Conf 100.0% · Weight 3.2%
openvlm_textvqa_official.textvqa_score_pct (Mar 12, 2026)
OpenVLM OCRBench Official: ocrbench_score_pct
Value 87.7% · Conf 100.0% · Weight 3.2%
openvlm_ocrbench_official.ocrbench_score_pct (Mar 12, 2026)
OpenVLM MTVQA Official: mtvqa_score_pct
Value 92.4% · Conf 100.0% · Weight 2.6%
openvlm_mtvqa_official.mtvqa_score_pct (Mar 12, 2026)
MMLongBench-Doc Leaderboard: acc_score_pct
Value 74.6% · Conf 100.0% · Weight 1.5%
mmlongbench_doc_leaderboard.acc_score_pct (Mar 12, 2026)
Model B
gpt-4.1-mini-20250414
external/openai/gpt-4-1-mini-20250414
Rank #5
OpenVLM OCRBench Official: ocrbench_score_pct
Value 88.4% · Conf 100.0% · Weight 3.2%
openvlm_ocrbench_official.ocrbench_score_pct (Mar 12, 2026)
OpenVLM TextVQA Official: textvqa_score_pct
Value 70.2% · Conf 100.0% · Weight 3.0%
openvlm_textvqa_official.textvqa_score_pct (Mar 12, 2026)
OpenVLM MTVQA Official: mtvqa_score_pct
Value 100.0% · Conf 100.0% · Weight 2.8%
openvlm_mtvqa_official.mtvqa_score_pct (Mar 12, 2026)
OpenVLM ChartQA Human Official: chartqa_human_score_pct
Value 46.9% · Conf 100.0% · Weight 1.3%
openvlm_chartqa_human_official.chartqa_human_score_pct (Mar 12, 2026)
▶Ranking Diagnostics & Missing Models
Source Lift
Ranked
24
Sources
8
Quality
Insufficient
Vals GPQA
vals_gpqa
12 rows
1.2% avg lift
Vals Mortgage Tax
vals_mortgage_tax
12 rows
0.4% avg lift
Vals Tax Eval v2
vals_tax_eval_v2
12 rows
0.3% avg lift
Vals MedQA
vals_medqa
11 rows
0.4% avg lift
Missing Strong Models
anthropic/claude-sonnet-4.6
external/anthropic/claude-sonnet-4-6
Rank #4
21.1%
openai/gpt-5.4-2026-03-05
external/openai/gpt-5-4-2026-03-05
Rank #10
18.9%
claude-opus-4-5-20251101
external/anthropic/claude-opus-4-5-20251101
Rank #13
17.0%
gpt-5.1-2025-11-13
external/openai/gpt-5-1-2025-11-13
Rank #14
17.0%
▶Taxonomy Details
Core Tasks
Required Modes
Domains
Related Use Cases
education
Grammar and writing coach
Correct grammar and explain fixes at the learner's level.
Top: gemini-2.5-flash
education
Lesson plan generator
Generate lesson plans with objectives, activities, and assessments.
Top: gpt-4.1-20250414
education
Language conversation partner
Conversational practice with gentle corrections and explanations.
Top: gemini-2.5-flash
education
Grading and feedback assistant
Provide rubric-tagged feedback drafts for educator review.
Top: gpt-4.1-20250414