BasedAGIBasedAGI
Menu
Rankings live

engineering

Component selection assistant

Recommend components under constraints with evidence and tradeoffs.

#1 Recommendation

gemini-3-pro-preview

Strong on FACTS Benchmark Suite facts_grounding_score_pct (88%) and FACTS Benchmark Suite facts_search_score_pct (100%)

external/google/gemini-3-pro-preview

37.7%

Score

47.9%

Confidence

Ranked Models

30

Evidence Quality

87%

Scoring

Benchmark-backed

Top Signal

FACTS Benchmark Suite: facts_grounding_score_pct

All Ranked Models

Max params:
Min confidence:
30 of 30
RankModelScore
#1gemini-3-pro-preview

Strong on FACTS Benchmark Suite facts_grounding_score_pct (88%) and FACTS Benchmark Suite facts_search_score_pct (100%)

37.7%
#2anthropic/claude-sonnet-4.6

Strong on Vals Finance Agent overall_accuracy_pct (100%) and Vals SWE-bench overall_accuracy_pct (95%)

30.2%
#3gemini-2.5-pro

Strong on FACTS Benchmark Suite facts_grounding_score_pct (100%) and Vectara HHEM Leaderboard overall_hallucination_error_pct (76%)

29.1%
#4google/gemini-3.1-pro-preview
28.5%
#5gpt-5-2025-08-07
28.3%
#6Grok-4-0709
27.9%
#7openai/gpt-5.4-2026-03-05
27.9%
#8claude-opus-4-5-20251101
26.9%
#9gpt-5-mini-2025-08-07
26.0%
#10gpt-4.1-20250414
25.5%
#11gemini-3-flash-preview
24.8%
#12gpt-5.1-2025-11-13
24.7%
#13claude-sonnet-4-20250514
23.4%
#14anthropic/claude-opus-4-6-thinking
23.3%
#15gpt-5.2-2025-12-11
22.9%
#16google/gemini-3.1-flash-lite-preview
22.3%
#17xai-org/grok-4-fast-reasoning
21.8%
#18anthropic/claude-opus-4-5-20251101-thinking
21.7%
#19kimi/kimi-k2.5-thinking
21.3%
#20xai-org/grok-4-1-fast-reasoning
20.8%
#22zai/glm-5-thinking
19.8%
#23anthropic/claude-sonnet-4-5-20250929-thinking
19.8%
#24alibaba/qwen3.5-flash
17.5%
#25anthropic/claude-haiku-4-5-20251001-thinking
16.9%
#27gemini-2.5-flash
16.7%
#28z-ai/glm-4.7
16.6%
#29minimax/minimax-m2.1
16.5%
#30o3-20250416
16.5%
#31Kimi K2 Thinking
16.0%
#32x-ai/grok-3
15.9%

Compare Models

Model A leads by +7.5%

Shareable Link →

Model A

gemini-3-pro-preview

external/google/gemini-3-pro-preview

37.7%

Rank #1

Confidence 47.9%23 evidence pts

FACTS Benchmark Suite: facts_grounding_score_pct

Value 88.3% · Conf 100.0% · Weight 2.3%

facts_benchmark_suite.facts_grounding_score_pct (Mar 12, 2026)

FACTS Benchmark Suite: facts_search_score_pct

Value 100.0% · Conf 100.0% · Weight 2.0%

facts_benchmark_suite.facts_search_score_pct (Mar 12, 2026)

FACTS Benchmark Suite: average_score_pct

Value 100.0% · Conf 100.0% · Weight 1.9%

facts_benchmark_suite.average_score_pct (Mar 12, 2026)

Vals SWE-bench: overall_accuracy_pct

Value 87.5% · Conf 100.0% · Weight 1.9%

vals_swebench.overall_accuracy_pct (Mar 12, 2026)

Model B

anthropic/claude-sonnet-4.6

external/anthropic/claude-sonnet-4-6

30.2%

Rank #2

Confidence 37.1%17 evidence pts

Vals Finance Agent: overall_accuracy_pct

Value 100.0% · Conf 100.0% · Weight 2.1%

vals_finance_agent.overall_accuracy_pct (Mar 12, 2026)

Vals SWE-bench: overall_accuracy_pct

Value 95.1% · Conf 100.0% · Weight 2.0%

vals_swebench.overall_accuracy_pct (Mar 12, 2026)

Vals CorpFin v2: overall_accuracy_pct

Value 91.5% · Conf 100.0% · Weight 1.9%

vals_corp_fin_v2.overall_accuracy_pct (Mar 12, 2026)

Vals LiveCodeBench: overall_accuracy_pct

Value 91.2% · Conf 100.0% · Weight 1.7%

vals_lcb.overall_accuracy_pct (Mar 12, 2026)

Ranking Diagnostics & Missing Models

Source Lift

Ranked

48

Sources

8

Quality

Sufficient

Vals CorpFin v2

vals_corp_fin_v2

42 rows

1.3% avg lift

Vals LiveCodeBench

vals_lcb

41 rows

1.4% avg lift

Vals SWE-bench

vals_swebench

34 rows

1.4% avg lift

Vals Legal Bench

vals_legal_bench

34 rows

0.3% avg lift

Missing Strong Models

gpt-4o

external/openai/gpt-4o

Rank #22

15.2%

Thin evidence after weighting
Taxonomy Details

Core Tasks

task.recommendation_matchingtask.tradeoff_analysis

Required Modes

none

Domains

domain.electrical_engineering

Related Use Cases