DiscoTrack: A Multilingual LLM Benchmark for Discourse Tracking

Nov-11-2025–arXiv.org Artificial Intelligence

Recent LLM benchmarks have tested models on a range of phenomena, but are still focused primarily on natural language understanding for extraction of explicit information, such as QA or summarization, with responses often targeting information from individual sentences. We are still lacking more challenging, and importantly also multilingual, benchmarks focusing on implicit information and pragmatic inferences across larger documents in the context of discourse tracking: integrating and aggregating information across sentences, paragraphs and multiple speaker utterances. To this end, we present DiscoTrack, an LLM benchmark targeting a range of tasks across 12 languages and four levels of discourse understanding: salience recognition, entity tracking, discourse relations and bridging inference. Our evaluation shows that these tasks remain challenging, even for state-of-the-art models.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

Nov-11-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (1.00)
- Europe (0.67)
- Asia > Middle East
  - UAE (0.28)

Genre:
- Research Report (1.00)

Industry:
- Education (0.67)
- Government > Regional Government
  - North America Government > United States Government (0.68)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found