finance
Quant research code generation
Generate backtest or analysis code from trading hypotheses.
#1 Recommendation
anthropic/claude-sonnet-4.6
Strong on Vals Finance Agent overall_accuracy_pct (100%) and Vals CorpFin v2 overall_accuracy_pct (91%)
external/anthropic/claude-sonnet-4-6
32.9%
Score
42.0%
Confidence
Ranked Models
30
Evidence Quality
86%
Scoring
Benchmark-backed
Top Signal
Vals Finance Agent: overall_accuracy_pct
All Ranked Models
Compare Models
Model A leads by +3.5%
Shareable Link →Model A
anthropic/claude-sonnet-4.6
external/anthropic/claude-sonnet-4-6
Rank #1
Vals Finance Agent: overall_accuracy_pct
Value 100.0% · Conf 100.0% · Weight 2.4%
vals_finance_agent.overall_accuracy_pct (Mar 12, 2026)
Vals CorpFin v2: overall_accuracy_pct
Value 91.5% · Conf 100.0% · Weight 2.0%
vals_corp_fin_v2.overall_accuracy_pct (Mar 12, 2026)
Vals Tax Eval v2: overall_accuracy_pct
Value 100.0% · Conf 100.0% · Weight 2.0%
vals_tax_eval_v2.overall_accuracy_pct (Mar 12, 2026)
OpenHands Issue Resolution: issue_resolution_score_pct
Value 71.8% · Conf 100.0% · Weight 1.4%
openhands_issue_resolution.issue_resolution_score_pct (Mar 12, 2026)
Model B
gemini-3-pro-preview
external/google/gemini-3-pro-preview
Rank #2
Vals Finance Agent: overall_accuracy_pct
Value 87.0% · Conf 100.0% · Weight 2.1%
vals_finance_agent.overall_accuracy_pct (Mar 12, 2026)
Vals CorpFin v2: overall_accuracy_pct
Value 86.7% · Conf 100.0% · Weight 1.9%
vals_corp_fin_v2.overall_accuracy_pct (Mar 12, 2026)
Vals Tax Eval v2: overall_accuracy_pct
Value 87.1% · Conf 100.0% · Weight 1.8%
vals_tax_eval_v2.overall_accuracy_pct (Mar 12, 2026)
Vals Mortgage Tax: overall_accuracy_pct
Value 99.3% · Conf 100.0% · Weight 1.4%
vals_mortgage_tax.overall_accuracy_pct (Mar 12, 2026)
▶Ranking Diagnostics & Missing Models
Source Lift
Ranked
50
Sources
8
Quality
Sufficient
Vals CorpFin v2
vals_corp_fin_v2
42 rows
1.1% avg lift
Vals Tax Eval v2
vals_tax_eval_v2
42 rows
1.7% avg lift
Vals GPQA
vals_gpqa
40 rows
0.7% avg lift
Vals Mortgage Tax
vals_mortgage_tax
33 rows
1.2% avg lift
Missing Strong Models
gpt-4o-2024-05-13
external/openai/gpt-4o-2024-05-13
Rank #51
10.5%
▶Taxonomy Details
Core Tasks
Required Modes
Domains
Related Use Cases
finance
Thesis red teaming
Stress-test an investment thesis with counterarguments and risk.
Top: gemini-3-pro-preview
finance
Earnings call synthesis
Summarize earnings calls into key points, tone, and risks.
Top: gemini-3-pro-preview
finance
AML alert triage
Triage AML alerts into severity, rationale, and next actions.
Top: gemini-3-pro-preview
finance
KYC profile synthesis
Turn identity docs and notes into a structured KYC profile.
Top: gemini-3-pro-preview