Best LLM for
Emotional Intelligence
EQ-Bench v3 publishes useful independent results for empathy, emotional reasoning, social dexterity, and creative writing. It is one publisher. BasedAGI withholds a broad EQ winner until two current independent sources cover at least 20 callable models.
EQ-Bench is one public measurement suite. Its sub-leaderboards cannot independently corroborate each other. Inspect the direct results below while a current two-source, frontier-covering comparison set is missing.
EQ-Bench rows can distinguish measured models within that source. They do not establish a broad EQ winner without current independent corroboration and comparable coverage.
High reasoning scores predict high EQ but not perfectly. Some models that excel at math and code score mediocre on EQ-Bench. Emotional intelligence appears to require a different kind of capability than logical deduction.
The Creative Writing v3 ranking differs from the main EQ ranking. A model can score high on emotional intelligence but produce stilted prose, or write vivid fiction while missing emotional nuance in conversation.
High EQ matters most for companion applications, therapeutic chatbots, character-driven roleplay, and fiction writing tools. For coding or data analysis, EQ score is irrelevant — pick by coding score instead.
Which LLM has the best emotional intelligence?
No broad EQ winner is published yet. EQ-Bench is public evidence, but a current two-source panel covering at least 20 callable models is required before naming the best model for emotional intelligence.
What is EQ-Bench?
EQ-Bench is an independent benchmark by Sam Paech that evaluates LLMs on eight emotional intelligence dimensions: empathy, pragmatic EI, depth of insight, social dexterity, emotional reasoning, validation/challenge balance, message tailoring, and overall EQ. Models are scored via pairwise comparison with an LLM judge, producing a rubric score (0-100) and an Elo rating.
Is emotional intelligence benchmarking reliable?
EQ-Bench uses pairwise evaluation and includes Judgemark for evaluator calibration, but those are still sub-leaderboards from one publisher. They are not independent corroboration.