narrative text
LiteraryQA: Towards Effective Evaluation of Long-document Narrative QA
Bonomo, Tommaso, Gioffré, Luca, Navigli, Roberto
Question Answering (QA) on narrative text poses a unique challenge to current systems, requiring a deep understanding of long, complex documents. However, the reliability of NarrativeQA, the most widely used benchmark in this domain, is hindered by noisy documents and flawed QA pairs. In this work, we introduce LiteraryQA, a high-quality subset of NarrativeQA focused on literary works. Using a human- and LLM-validated pipeline, we identify and correct low-quality QA samples while removing extraneous text from source documents. We then carry out a meta-evaluation of automatic metrics to clarify how systems should be evaluated on LiteraryQA. This analysis reveals that all n-gram-based metrics have a low system-level correlation to human judgment, while LLM-as-a-Judge evaluations, even with small open-weight models, can strongly agree with the ranking identified by humans. Finally, we benchmark a set of long-context LLMs on LiteraryQA. We release our code and data at https://github.com/SapienzaNLP/LiteraryQA.
- Europe > Austria > Vienna (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- (17 more...)
Chronological Passage Assembling in RAG framework for Temporal Question Answering
Kim, Byeongjeong, Park, Jeonghyun, Yang, Joonho, Lee, Hwanhee
Long-context question answering over narrative tasks is challenging because correct answers often hinge on reconstructing a coherent timeline of events while preserving contextual f low in a limited context window. Retrievalaugmented generation (RAG) methods aim to address this challenge by selectively retrieving only necessary document segments. However, narrative texts possess unique characteristics that limit the effectiveness of these existing approaches. Specifically, understanding narrative texts requires more than isolated segments, as the broader context and sequential relationships between segments are crucial for comprehension. To address these limitations, we propose ChronoRAG, a novel RAG framework specialized for narrative texts. This approach focuses on two essential aspects: refining dispersed document information into coherent and structured passages and preserving narrative flow by explicitly capturing and maintaining the temporal order among retrieved passages. We empirically demonstrate the effectiveness of ChronoRAG through experiments on the NarrativeQA and GutenQAdataset, showing substantial improvements in tasks requiring both factual identification and comprehension of complex sequential relationships, underscoring that reasoning over temporal order is crucial in resolving narrative QA.
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Europe > France (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (2 more...)
SymbolicThought: Integrating Language Models and Symbolic Reasoning for Consistent and Interpretable Human Relationship Understanding
Zhao, Runcong, Zhu, Qinglin, Xu, Hainiu, Liang, Bin, Gui, Lin, He, Yulan
Understanding character relationships is essential for interpreting complex narratives and conducting socially grounded AI research. However, manual annotation is time-consuming and low in coverage, while large language models (LLMs) often produce hallucinated or logically inconsistent outputs. We present SymbolicThought, a human-in-the-loop framework that combines LLM-based extraction with symbolic reasoning. The system constructs editable character relationship graphs, refines them using seven types of logical constraints, and enables real-time validation and conflict resolution through an interactive interface. To support logical supervision and explainable social analysis, we release a dataset of 160 interpersonal relationships with corresponding logical structures. Experiments show that SymbolicThought improves annotation accuracy and consistency while significantly reducing time cost, offering a practical tool for narrative understanding, explainable AI, and LLM evaluation.
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
On The Role of Intentionality in Knowledge Representation: Analyzing Scene Context for Cognitive Agents with a Tiny Language Model
Cognitive abilities, which include ideas like intentionality and consciousness, have long been viewed in Western philosophy as exclusive to the human realm. Intent is roundly considered justifiable only with minimum requirements for self-awareness or situational comprehension. However, such hard line views have softened gradually with modern enlightenment, and more of us are likely to accept that terms such as'agency', 'intelligence', and even'emotion' can apply for other species too. Even plants lean into sunlight in an intentional way; the identification of an intention doesn't have to arise from the plant to be true. Latterly their possibility has been extended even to artificial systems, which some find more acceptable, though a modern version of the privilege argument persists in a distinction between'simple' machinery and'complex' biology, which many believe still holds some principled leap in understanding. Ideological'blood-brain barriers', like these, continue to undermine efforts to form a rational causal explanation of intent, leading extremists to clutch at esoteric straws like quantum mechanics or complexity theory to account for perceived magic. In this note, I address another apparent schism that may shed light on these questions: the difference between process dynamics (the realm of physics) and interpretive semantics (the realm of linguistics and philosophy), and the suggestion that (deep down) intentionality might be a relatively simple phenomenon with an energetic explanation (as trust has been shown to be [9]). The recent acceptance of attention mechanisms in Large Language Models is related example [19, 22].
- South America (0.04)
- North America > United States > Illinois (0.04)
- Health & Medicine > Therapeutic Area > Neurology (0.48)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.34)
Coreference Resolution for Vietnamese Narrative Texts
Tran, Hieu-Dai, Nguyen, Duc-Vu, Nguyen, Ngan Luu-Thuy
Coreference resolution is a vital task in natural language processing (NLP) that involves identifying and linking different expressions in a text that refer to the same entity. This task is particularly challenging for Vietnamese, a low-resource language with limited annotated datasets. To address these challenges, we developed a comprehensive annotated dataset using narrative texts from VnExpress, a widely-read Vietnamese online news platform. We established detailed guidelines for annotating entities, focusing on ensuring consistency and accuracy. Additionally, we evaluated the performance of large language models (LLMs), specifically GPT-3.5-Turbo and GPT-4, on this dataset. Our results demonstrate that GPT-4 significantly outperforms GPT-3.5-Turbo in terms of both accuracy and response consistency, making it a more reliable tool for coreference resolution in Vietnamese.
- Asia > Vietnam > Hồ Chí Minh City > Hồ Chí Minh City (0.04)
- Asia > Thailand (0.04)
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Beyond LLMs: A Linguistic Approach to Causal Graph Generation from Narrative Texts
Li, Zehan, Pan, Ruhua, Pi, Xinyu
We propose a novel framework for generating causal graphs from narrative texts, bridging high-level causality and detailed event-specific relationships. Our method first extracts concise, agent-centered vertices using large language model (LLM)-based summarization. We introduce an "Expert Index," comprising seven linguistically informed features, integrated into a Situation-Task-Action-Consequence (STAC) classification model. This hybrid system, combining RoBERTa embeddings with the Expert Index, achieves superior precision in causal link identification compared to pure LLM-based approaches. Finally, a structured five-iteration prompting process refines and constructs connected causal graphs. Experiments on 100 narrative chapters and short stories demonstrate that our approach consistently outperforms GPT-4o and Claude 3.5 in causal graph quality, while maintaining readability. The open-source tool provides an interpretable, efficient solution for capturing nuanced causal chains in narratives.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Greenland (0.04)
- (2 more...)
A Scalable Data-Driven Framework for Systematic Analysis of SEC 10-K Filings Using Large Language Models
Daimi, Syed Affan, Iqbal, Asma
The number of companies listed on the NYSE has been growing exponentially, creating a significant challenge for market analysts, traders, and stockholders who must monitor and assess the performance and strategic shifts of a large number of companies regularly. There is an increasing need for a fast, cost-effective, and comprehensive method to evaluate the performance and detect and compare many companies' strategy changes efficiently. We propose a novel data-driven approach that leverages large language models (LLMs) to systematically analyze and rate the performance of companies based on their SEC 10-K filings. These filings, which provide detailed annual reports on a company's financial performance and strategic direction, serve as a rich source of data for evaluating various aspects of corporate health, including confidence, environmental sustainability, innovation, and workforce management. We also introduce an automated system for extracting and preprocessing 10-K filings. This system accurately identifies and segments the required sections as outlined by the SEC, while also isolating key textual content that contains critical information about the company. This curated data is then fed into Cohere's Command-R+ LLM to generate quantitative ratings across various performance metrics. These ratings are subsequently processed and visualized to provide actionable insights. The proposed scheme is then implemented on an interactive GUI as a no-code solution for running the data pipeline and creating the visualizations. The application showcases the rating results and provides year-on-year comparisons of company performance.
- North America > United States (0.67)
- Europe > Switzerland (0.04)
- Law (1.00)
- Banking & Finance > Trading (1.00)
- Government > Regional Government > North America Government > United States Government (0.67)
MetaBGM: Dynamic Soundtrack Transformation For Continuous Multi-Scene Experiences With Ambient Awareness And Personalization
Liu, Haoxuan, Wang, Zihao, Hong, Haorong, Feng, Youwei, Yu, Jiaxin, Diao, Han, Xu, Yunfei, Zhang, Kejun
This paper introduces MetaBGM, a groundbreaking framework for generating background music that adapts to dynamic scenes and real-time user interactions. We define multi-scene as variations in environmental contexts, such as transitions in game settings or movie scenes. To tackle the challenge of converting backend data into music description texts for audio generation models, MetaBGM employs a novel two-stage generation approach that transforms continuous scene and user state data into these texts, which are then fed into an audio generation model for real-time soundtrack creation. Experimental results demonstrate that MetaBGM effectively generates contextually relevant and dynamic background music for interactive applications.
- Media > Music (1.00)
- Leisure & Entertainment > Games > Computer Games (0.47)
Renard: A Modular Pipeline for Extracting Character Networks from Narrative Texts
Amalvy, Arthur, Labatut, Vincent, Dufour, Richard
Renard (Relationships Extraction from NARrative Documents) is a Python library that allows users to define custom natural language processing (NLP) pipelines to extract character networks from narrative texts. Contrary to the few existing tools, Renard can extract dynamic networks, as well as the more common static networks. Renard pipelines are modular: users can choose the implementation of each NLP subtask needed to extract a character network. This allows users to specialize pipelines to particular types of texts and to study the impact of each subtask on the extracted network.
- Europe > Switzerland > Vaud > Lausanne (0.05)
- Europe > France > Pays de la Loire > Loire-Atlantique > Nantes (0.05)
MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation
Fu, Ruibo, Shi, Shuchen, Guo, Hongming, Wang, Tao, Qiang, Chunyu, Wen, Zhengqi, Tao, Jianhua, Qi, Xin, Lu, Yi, Wang, Xiaopeng, Wang, Zhiyong, Liu, Yukun, Liu, Xuefei, Zhang, Shuai, Li, Guanjun
Foley audio, critical for enhancing the immersive experience in multimedia content, faces significant challenges in the AI-generated content (AIGC) landscape. Despite advancements in AIGC technologies for text and image generation, the foley audio dubbing remains rudimentary due to difficulties in cross-modal scene matching and content correlation. Current text-to-audio technology, which relies on detailed and acoustically relevant textual descriptions, falls short in practical video dubbing applications. Existing datasets like AudioSet, AudioCaps, Clotho, Sound-of-Story, and WavCaps do not fully meet the requirements for real-world foley audio dubbing task. To address this, we introduce the Multi-modal Image and Narrative Text Dubbing Dataset (MINT), designed to enhance mainstream dubbing tasks such as literary story audiobooks dubbing, image/silent video dubbing. Besides, to address the limitations of existing TTA technology in understanding and planning complex prompts, a Foley Audio Content Planning, Generation, and Alignment (CPGA) framework is proposed, which includes a content planning module leveraging large language models for complex multi-modal prompts comprehension. Additionally, the training process is optimized using Proximal Policy Optimization based reinforcement learning, significantly improving the alignment and auditory realism of generated foley audio. Experimental results demonstrate that our approach significantly advances the field of foley audio dubbing, providing robust solutions for the challenges of multi-modal dubbing. Even when utilizing the relatively lightweight GPT-2 model, our framework outperforms open-source multimodal large models such as LLaVA, DeepSeek-VL, and Moondream2. The dataset is available at https://github.com/borisfrb/MINT .