live
weekly refresh
basedagi.org
▸ vertical

Best LLM for
Emotional Intelligence

EQ-Bench v3 publishes useful independent results for empathy, emotional reasoning, social dexterity, and creative writing. It is one publisher. BasedAGI withholds a broad EQ winner until two current independent sources cover at least 20 callable models.

no broad winner published

EQ-Bench is one public measurement suite. Its sub-leaderboards cannot independently corroborate each other. Inspect the direct results below while a current two-source, frontier-covering comparison set is missing.

▸ what EQ-Bench measures
Demonstrated empathyReading emotional context and responding with appropriate acknowledgment.
Pragmatic EIPractical application of emotional intelligence in realistic scenarios.
Depth of insightGoing beyond surface responses to understand underlying emotional dynamics.
Social dexterityNavigating interpersonal complexity without blundering.
Validation vs challengeKnowing when to support and when to push back.
Creative writing qualityProse quality, vocabulary range, and avoiding formulaic output.
▸ benchmarks used
EQ-Bench v3Pairwise EI evaluation · 8 dimensions · rubric 0-100
Creative Writing v3Pairwise prose quality · style, originality, vocabulary
JudgemarkLLM-as-judge calibration · correlation with human raters
Source: eqbench.com by Sam Paech — independent, updated frequently.
▸ analysis
The direct rows are evidence

EQ-Bench rows can distinguish measured models within that source. They do not establish a broad EQ winner without current independent corroboration and comparable coverage.

EQ and reasoning are correlated but not the same

High reasoning scores predict high EQ but not perfectly. Some models that excel at math and code score mediocre on EQ-Bench. Emotional intelligence appears to require a different kind of capability than logical deduction.

Creative writing diverges from EQ

The Creative Writing v3 ranking differs from the main EQ ranking. A model can score high on emotional intelligence but produce stilted prose, or write vivid fiction while missing emotional nuance in conversation.

Use cases: therapy, roleplay, fiction

High EQ matters most for companion applications, therapeutic chatbots, character-driven roleplay, and fiction writing tools. For coding or data analysis, EQ score is irrelevant — pick by coding score instead.

▸ frequently asked

Which LLM has the best emotional intelligence?

No broad EQ winner is published yet. EQ-Bench is public evidence, but a current two-source panel covering at least 20 callable models is required before naming the best model for emotional intelligence.

What is EQ-Bench?

EQ-Bench is an independent benchmark by Sam Paech that evaluates LLMs on eight emotional intelligence dimensions: empathy, pragmatic EI, depth of insight, social dexterity, emotional reasoning, validation/challenge balance, message tailoring, and overall EQ. Models are scored via pairwise comparison with an LLM judge, producing a rubric score (0-100) and an Elo rating.

Is emotional intelligence benchmarking reliable?

EQ-Bench uses pairwise evaluation and includes Judgemark for evaluator calibration, but those are still sub-leaderboards from one publisher. They are not independent corroboration.