SEER: The Span-based Emotion Evidence Retrieval Benchmark
Sampath, Aneesha, Aran, Oya, Provost, Emily Mower
–arXiv.org Artificial Intelligence
We introduce the SEER (Span-based Emotion Evidence Retrieval) Benchmark to test Large Language Models' (LLMs) ability to identify the specific spans of text that express emotion. Unlike traditional emotion recognition tasks that assign a single label to an entire sentence, SEER targets the underexplored task of emotion evidence detection: pinpointing which exact phrases convey emotion. This span-level approach is crucial for applications like empathetic dialogue and clinical support, which need to know how emotion is expressed, not just what the emotion is. SEER includes two tasks: identifying emotion evidence within a single sentence, and identifying evidence across a short passage of five consecutive sentences. It contains new annotations for both emotion and emotion evidence on 1200 real-world sentences. We evaluate 14 open-source LLMs and find that, while some models approach average human performance on single-sentence inputs, their accuracy degrades in longer passages. Our error analysis reveals key failure modes, including overreliance on emotion keywords and false positives in neutral text.
arXiv.org Artificial Intelligence
Oct-29-2025
- Country:
- Asia > China
- Hong Kong (0.04)
- North America
- Dominican Republic (0.04)
- United States > Michigan (0.04)
- Asia > China
- Genre:
- Research Report (1.00)
- Industry:
- Health & Medicine (0.67)
- Technology: