engineering
Component selection assistant
Recommend components under constraints with evidence and tradeoffs.
#1 Recommendation
gemini-3-pro-preview
Strong on FACTS Benchmark Suite facts_grounding_score_pct (88%) and FACTS Benchmark Suite facts_search_score_pct (100%)
external/google/gemini-3-pro-preview
37.7%
Score
47.9%
Confidence
Ranked Models
30
Evidence Quality
87%
Scoring
Benchmark-backed
Top Signal
FACTS Benchmark Suite: facts_grounding_score_pct
All Ranked Models
Compare Models
Model A leads by +7.5%
Shareable Link →Model A
gemini-3-pro-preview
external/google/gemini-3-pro-preview
Rank #1
FACTS Benchmark Suite: facts_grounding_score_pct
Value 88.3% · Conf 100.0% · Weight 2.3%
facts_benchmark_suite.facts_grounding_score_pct (Mar 12, 2026)
FACTS Benchmark Suite: facts_search_score_pct
Value 100.0% · Conf 100.0% · Weight 2.0%
facts_benchmark_suite.facts_search_score_pct (Mar 12, 2026)
FACTS Benchmark Suite: average_score_pct
Value 100.0% · Conf 100.0% · Weight 1.9%
facts_benchmark_suite.average_score_pct (Mar 12, 2026)
Vals SWE-bench: overall_accuracy_pct
Value 87.5% · Conf 100.0% · Weight 1.9%
vals_swebench.overall_accuracy_pct (Mar 12, 2026)
Model B
anthropic/claude-sonnet-4.6
external/anthropic/claude-sonnet-4-6
Rank #2
Vals Finance Agent: overall_accuracy_pct
Value 100.0% · Conf 100.0% · Weight 2.1%
vals_finance_agent.overall_accuracy_pct (Mar 12, 2026)
Vals SWE-bench: overall_accuracy_pct
Value 95.1% · Conf 100.0% · Weight 2.0%
vals_swebench.overall_accuracy_pct (Mar 12, 2026)
Vals CorpFin v2: overall_accuracy_pct
Value 91.5% · Conf 100.0% · Weight 1.9%
vals_corp_fin_v2.overall_accuracy_pct (Mar 12, 2026)
Vals LiveCodeBench: overall_accuracy_pct
Value 91.2% · Conf 100.0% · Weight 1.7%
vals_lcb.overall_accuracy_pct (Mar 12, 2026)
▶Ranking Diagnostics & Missing Models
Source Lift
Ranked
48
Sources
8
Quality
Sufficient
Vals CorpFin v2
vals_corp_fin_v2
42 rows
1.3% avg lift
Vals LiveCodeBench
vals_lcb
41 rows
1.4% avg lift
Vals SWE-bench
vals_swebench
34 rows
1.4% avg lift
Vals Legal Bench
vals_legal_bench
34 rows
0.3% avg lift
Missing Strong Models
gpt-4o
external/openai/gpt-4o
Rank #22
15.2%
▶Taxonomy Details
Core Tasks
Required Modes
Domains
Related Use Cases
engineering
Simulation setup assistant
Turn design requirements into simulation setup checklists and boundary notes.
Top: gemini-3-pro-preview
engineering
Verilog/VHDL generation
Generate RTL code and testbenches from functional specs.
Top: z-ai/glm-4.7
engineering
CAD scripting helper
Generate and debug CAD automation scripts and parametric geometry code.
Top: anthropic/claude-sonnet-4.6