companion
Tarot-style reading
Symbolic, personalized readings with consistent persona.
#1 Recommendation
Arch-Agent-32B
Strong on BFCL Multi-turn Official Multi Turn Acc (70%) and BFCL Relevance Detection Official Relevance Detection (81%)
katanemo/Arch-Agent-32B
22.3%
Score
40.2%
Confidence
Limited benchmark evidence for this use case.
38 ranked models with average evidence of 12.0 points. Rankings may shift as more benchmark data is ingested.
Ranked Models
30
Evidence Quality
80%
Scoring
Benchmark-backed
Top Signal
BFCL Multi-turn Official: Multi Turn Acc
All Ranked Models
| Rank | Model | Score |
|---|---|---|
| #30 | Arch-Agent-32B | 22.3% |
| #59 | Grok-4-0709 | 17.0% |
| #61 | qwen-2.5-72b-instruct | 16.8% |
| #63 | gpt-4.1-20250414 | 16.5% |
| #67 | Arch-Agent-3B | 16.2% |
| #68 | gemini-2.5-pro | 16.1% |
| #69 | Arch-Agent-1.5B | 15.7% |
| #77 | gpt-4o | 14.6% |
| #100 | xai-org/grok-4-fast-reasoning | 12.1% |
| #109 | xai-org/grok-4-1-fast-reasoning | 11.4% |
| #110 | gemma-2-27b-it | 11.2% |
| #111 | gemini-3-pro-preview | 11.1% |
| #117 | x-ai/grok-3 | 10.7% |
| #122 | google/gemini-3.1-pro-preview | 10.1% |
| #123 | claude-sonnet-4-20250514 | 10.1% |
| #127 | gpt-5-2025-08-07 | 9.2% |
| #129 | openai/gpt-5.4-2026-03-05 | 9.1% |
| #131 | Kimi-K2-Instruct | 8.9% |
| #132 | gpt-5.1-2025-11-13 | 8.8% |
| #134 | xai-org/grok-4-1-fast-non-reasoning | 8.8% |
| #135 | anthropic/claude-sonnet-4.6 | 8.8% |
| #136 | claude-opus-4-5-20251101 | 8.7% |
| #139 | gpt-4o-2024-05-13 | 8.6% |
| #140 | gpt-5-mini-2025-08-07 | 8.5% |
| #142 | gemini-3-flash-preview | 8.3% |
| #143 | xai-org/grok-4-fast-non-reasoning | 8.3% |
| #147 | qwen/qwen3-max | 7.9% |
| #148 | gemini-2.5-flash | 7.9% |
| #149 | deepseek-v3 | 7.7% |
| #151 | kimi/kimi-k2.5-thinking | 7.5% |
Compare Models
Model A leads by +5.2%
Shareable Link →Model A
Arch-Agent-32B
katanemo/Arch-Agent-32B
Rank #30
BFCL Multi-turn Official: Multi Turn Acc
Value 70.1% · Conf 100.0% · Weight 7.0%
bfcl_multiturn_official.multi_turn_acc (Mar 12, 2026)
BFCL Relevance Detection Official: Relevance Detection
Value 81.3% · Conf 100.0% · Weight 6.2%
bfcl_relevance_detection_official.relevance_detection (Mar 12, 2026)
BFCL Relevance Detection Official: Irrelevance Detection
Value 81.0% · Conf 100.0% · Weight 2.5%
bfcl_relevance_detection_official.irrelevance_detection (Mar 12, 2026)
BFCL Memory Official: Memory Acc
Value 19.8% · Conf 100.0% · Weight 2.4%
bfcl_memory_official.memory_acc (Mar 12, 2026)
Model B
Grok-4-0709
external/xai/grok-4-0709
Rank #59
UGI Leaderboard: Entertainment
Value 100.0% · Conf 100.0% · Weight 2.7%
ugi_main.entertainment (Mar 12, 2026)
UGI Leaderboard: Writing ✍️
Value 99.2% · Conf 100.0% · Weight 2.7%
ugi_main.writing (Mar 12, 2026)
Galileo Agent Leaderboard v2: Avg TSQ
Value 84.6% · Conf 100.0% · Weight 1.1%
galileo_agent_v2.avg_tsq (Mar 12, 2026)
Galileo Agent Leaderboard v2: Avg AC
Value 56.5% · Conf 100.0% · Weight 1.1%
galileo_agent_v2.avg_ac (Mar 12, 2026)
▶Ranking Diagnostics & Missing Models
Source Lift
Ranked
38
Sources
8
Quality
Insufficient
Vals CorpFin v2
vals_corp_fin_v2
22 rows
0.4% avg lift
Vals Legal Bench
vals_legal_bench
21 rows
0.5% avg lift
Vals MedQA
vals_medqa
21 rows
0.5% avg lift
Vals GPQA
vals_gpqa
21 rows
0.4% avg lift
Missing Strong Models
gpt-5.2-2025-12-11
external/openai/gpt-5-2-2025-12-11
Rank #16
16.2%
anthropic/claude-opus-4-6-thinking
external/anthropic/claude-opus-4-6-thinking
Rank #17
16.1%
google/gemini-3.1-flash-lite-preview
external/google/gemini-3-1-flash-lite-preview
Rank #19
15.6%
anthropic/claude-opus-4-5-20251101-thinking
external/anthropic/claude-opus-4-5-20251101-thinking
Rank #21
15.2%
▶Taxonomy Details
Core Tasks
Required Modes
Domains
Related Use Cases
companion
Mindfulness and meditation scripts
Generate calming scripts and exercises tailored to a user's context.
Top: Arch-Agent-32B
companion
Casual chat companion
Engaging conversation with consistent tone and context.
Top: Arch-Agent-32B
companion
Empathetic support chat
Supportive conversation with strong boundaries and safe escalation.
Top: Arch-Agent-32B
companion
Life coaching and goal planning
Goal setting, habit planning, and accountability check-ins.
Top: Arch-Agent-32B