companion
Casual chat companion
Engaging conversation with consistent tone and context.
#1 Recommendation
Arch-Agent-32B
Strong on BFCL Multi-turn Official Multi Turn Acc (70%) and BFCL Relevance Detection Official Relevance Detection (81%)
katanemo/Arch-Agent-32B
23.4%
Score
42.2%
Confidence
Limited benchmark evidence for this use case.
42 ranked models with average evidence of 12.0 points. Rankings may shift as more benchmark data is ingested.
Ranked Models
30
Evidence Quality
79%
Scoring
Benchmark-backed
Top Signal
BFCL Multi-turn Official: Multi Turn Acc
All Ranked Models
| Rank | Model | Score |
|---|---|---|
| #30 | Arch-Agent-32B | 23.4% |
| #58 | Grok-4-0709 | 17.9% |
| #61 | gpt-4.1-20250414 | 17.3% |
| #65 | Arch-Agent-3B | 17.0% |
| #66 | gemini-2.5-pro | 16.9% |
| #67 | Arch-Agent-1.5B | 16.5% |
| #92 | xai-org/grok-4-fast-reasoning | 12.7% |
| #100 | xai-org/grok-4-1-fast-reasoning | 12.0% |
| #101 | qwen-2.5-72b-instruct | 11.8% |
| #102 | gemini-3-pro-preview | 11.6% |
| #107 | x-ai/grok-3 | 11.2% |
| #112 | google/gemini-3.1-pro-preview | 10.6% |
| #113 | claude-sonnet-4-20250514 | 10.6% |
| #114 | gpt-4o-2024-05-13 | 10.5% |
| #115 | gpt-5-2025-08-07 | 9.7% |
| #116 | openai/gpt-5.4-2026-03-05 | 9.5% |
| #117 | Kimi-K2-Instruct | 9.3% |
| #118 | gpt-5.1-2025-11-13 | 9.3% |
| #119 | xai-org/grok-4-1-fast-non-reasoning | 9.2% |
| #120 | anthropic/claude-sonnet-4.6 | 9.2% |
| #121 | claude-opus-4-5-20251101 | 9.1% |
| #122 | gpt-5-mini-2025-08-07 | 8.9% |
| #125 | gemini-3-flash-preview | 8.7% |
| #126 | xai-org/grok-4-fast-non-reasoning | 8.7% |
| #127 | gpt-4o | 8.6% |
| #128 | qwen/qwen3-max | 8.3% |
| #129 | gemini-2.5-flash | 8.3% |
| #131 | Llama-2-7b-chat-hf | 8.1% |
| #132 | deepseek-v3 | 8.1% |
| #133 | kimi/kimi-k2.5-thinking | 7.9% |
Compare Models
Model A leads by +5.5%
Shareable Link →Model A
Arch-Agent-32B
katanemo/Arch-Agent-32B
Rank #30
BFCL Multi-turn Official: Multi Turn Acc
Value 70.1% · Conf 100.0% · Weight 7.3%
bfcl_multiturn_official.multi_turn_acc (Mar 12, 2026)
BFCL Relevance Detection Official: Relevance Detection
Value 81.3% · Conf 100.0% · Weight 6.5%
bfcl_relevance_detection_official.relevance_detection (Mar 12, 2026)
BFCL Relevance Detection Official: Irrelevance Detection
Value 81.0% · Conf 100.0% · Weight 2.6%
bfcl_relevance_detection_official.irrelevance_detection (Mar 12, 2026)
BFCL Memory Official: Memory Acc
Value 19.8% · Conf 100.0% · Weight 2.4%
bfcl_memory_official.memory_acc (Mar 12, 2026)
Model B
Grok-4-0709
external/xai/grok-4-0709
Rank #58
UGI Leaderboard: Entertainment
Value 100.0% · Conf 100.0% · Weight 2.8%
ugi_main.entertainment (Mar 12, 2026)
UGI Leaderboard: Writing ✍️
Value 99.2% · Conf 100.0% · Weight 2.8%
ugi_main.writing (Mar 12, 2026)
Galileo Agent Leaderboard v2: Avg TSQ
Value 84.6% · Conf 100.0% · Weight 1.2%
galileo_agent_v2.avg_tsq (Mar 12, 2026)
Galileo Agent Leaderboard v2: Avg AC
Value 56.5% · Conf 100.0% · Weight 1.1%
galileo_agent_v2.avg_ac (Mar 12, 2026)
▶Ranking Diagnostics & Missing Models
Source Lift
Ranked
42
Sources
8
Quality
Insufficient
Vals CorpFin v2
vals_corp_fin_v2
25 rows
0.4% avg lift
Vals Legal Bench
vals_legal_bench
24 rows
0.5% avg lift
Vals Tax Eval v2
vals_tax_eval_v2
23 rows
0.5% avg lift
Vals MedQA
vals_medqa
23 rows
0.5% avg lift
Missing Strong Models
gpt-5.2-2025-12-11
external/openai/gpt-5-2-2025-12-11
Rank #16
16.2%
anthropic/claude-opus-4-6-thinking
external/anthropic/claude-opus-4-6-thinking
Rank #17
16.1%
anthropic/claude-opus-4-5-20251101-thinking
external/anthropic/claude-opus-4-5-20251101-thinking
Rank #21
15.2%
anthropic/claude-sonnet-4-5-20250929-thinking
external/anthropic/claude-sonnet-4-5-20250929-thinking
Rank #28
14.1%
▶Taxonomy Details
Core Tasks
Required Modes
Domains
Related Use Cases
companion
Tarot-style reading
Symbolic, personalized readings with consistent persona.
Top: Arch-Agent-32B
companion
Mindfulness and meditation scripts
Generate calming scripts and exercises tailored to a user's context.
Top: Arch-Agent-32B
companion
Empathetic support chat
Supportive conversation with strong boundaries and safe escalation.
Top: Arch-Agent-32B
companion
Life coaching and goal planning
Goal setting, habit planning, and accountability check-ins.
Top: Arch-Agent-32B