BasedAGIBasedAGI
Menu
Rankings live

real_estate

Title-like document entity extraction

Extract and reconcile owners/entities across fragmented property docs.

#1 Recommendation

gpt-4.1-20250414

Strong on MMLongBench-Doc Leaderboard acc_score_pct (75%) and Galileo Agent Leaderboard v2 Avg AC (100%)

external/openai/gpt-4-1-20250414

16.4%

Score

21.6%

Confidence

Limited benchmark evidence for this use case.

11 ranked models with average evidence of 15.0 points. Rankings may shift as more benchmark data is ingested.

Ranked Models

11

Evidence Quality

68%

Scoring

Benchmark-backed

Top Signal

MMLongBench-Doc Leaderboard: acc_score_pct

All Ranked Models

Max params:
Min confidence:
11 of 11
RankModelScore
#2gpt-4.1-20250414

Strong on MMLongBench-Doc Leaderboard acc_score_pct (75%) and Galileo Agent Leaderboard v2 Avg AC (100%)

16.4%
#15gemini-2.5-pro
10.8%
#19gpt-4o-20241120
10.1%
#20gemini-3-pro-preview
9.8%
#21claude-sonnet-4-20250514
9.7%
#22qwen-2.5-72b-instruct
9.7%
#23Grok-4-0709
9.4%
#24gemini-2.5-flash
9.1%
#25gpt-4o
8.7%
#26deepseek/deepseek-r1
7.5%
#28openai/gpt-4o-mini-2024-07-18
5.8%

Compare Models

Model A leads by +5.6%

Shareable Link →

Model A

gpt-4.1-20250414

external/openai/gpt-4-1-20250414

16.4%

Rank #2

Confidence 21.6%18 evidence pts

MMLongBench-Doc Leaderboard: acc_score_pct

Value 74.6% · Conf 100.0% · Weight 4.8%

mmlongbench_doc_leaderboard.acc_score_pct (Mar 12, 2026)

Galileo Agent Leaderboard v2: Avg AC

Value 100.0% · Conf 100.0% · Weight 3.2%

galileo_agent_v2.avg_ac (Mar 12, 2026)

Galileo Agent Leaderboard v2: Avg TSQ

Value 64.1% · Conf 100.0% · Weight 0.7%

galileo_agent_v2.avg_tsq (Mar 12, 2026)

Vectara HHEM Leaderboard: overall_hallucination_error_pct

Value 82.5% · Conf 100.0% · Weight 0.5%

vectara_hhem_leaderboard.overall_hallucination_error_pct (Mar 12, 2026)

Model B

gemini-2.5-pro

external/google/gemini-2-5-pro

10.8%

Rank #15

Confidence 17.9%21 evidence pts

Galileo Agent Leaderboard v2: Avg AC

Value 58.7% · Conf 100.0% · Weight 1.9%

galileo_agent_v2.avg_ac (Mar 12, 2026)

LEXam Leaderboard: average_score_pct

Value 89.4% · Conf 100.0% · Weight 1.3%

lexam_leaderboard.average_score_pct (Mar 12, 2026)

Galileo Agent Leaderboard v2: Avg TSQ

Value 79.5% · Conf 100.0% · Weight 0.8%

galileo_agent_v2.avg_tsq (Mar 12, 2026)

Vectara HHEM Leaderboard: overall_hallucination_error_pct

Value 76.0% · Conf 100.0% · Weight 0.4%

vectara_hhem_leaderboard.overall_hallucination_error_pct (Mar 12, 2026)

Ranking Diagnostics & Missing Models

Source Lift

Ranked

11

Sources

8

Quality

Insufficient

Vals CorpFin v2

vals_corp_fin_v2

7 rows

0.3% avg lift

Galileo Agent Leaderboard v2

galileo_agent_v2

6 rows

1.6% avg lift

Vals Legal Bench

vals_legal_bench

6 rows

0.4% avg lift

Vals Mortgage Tax

vals_mortgage_tax

6 rows

0.4% avg lift

Missing Strong Models

anthropic/claude-sonnet-4.6

external/anthropic/claude-sonnet-4-6

Rank #4

21.1%

Thin evidence after weighting

gpt-5-mini-2025-08-07

external/openai/gpt-5-mini-2025-08-07

Rank #7

19.6%

Thin evidence after weighting

google/gemini-3.1-pro-preview

external/google/gemini-3-1-pro-preview

Rank #8

19.3%

Thin evidence after weighting

gpt-5-2025-08-07

external/openai/gpt-5-2025-08-07

Rank #9

19.2%

Thin evidence after weighting
Taxonomy Details

Core Tasks

task.entity_extractiontask.dedupe_normalize_records

Required Modes

mode.json_schema

Domains

domain.real_estate_title_docs

Related Use Cases