Discourse & Dialogue
TMT: A Simple Way to Translate Topic Models Using Dictionaries
The training of topic models for a multilingual environment is a challenging task, requiring the use of sophisticated algorithms, topic-aligned corpora, and manual evaluation. These difficulties are further exacerbated when the developer lacks knowledge of the target language or is working in an environment with limited data, where only small or unusable multilingual corpora are available. Considering these challenges, we introduce Topic Model Translation (TMT), a novel, robust and transparent technique designed to transfer topic models (e.g., Latent Dirichlet Allocation (LDA) based topic models) from one language to another, without the need for metadata, embeddings, or aligned corpora. TMT enables the reuse of topic models across languages, making it especially suitable for scenarios where large corpora in the target language are unavailable or manual translation is infeasible. Furthermore, we evaluate TMT extensively using both quantitative and qualitative methods, demonstrating that it produces semantically coherent and consistent topic translations.
Building Task Bots with Self-learning for Enhanced Adaptability, Extensibility, and Factuality
This thesis examines the obstacles and potential solutions for creating such bots, focusing on innovative techniques that enable bots to learn and adapt autonomously in constantly changing environments. End-to-end task bots, typically built using a static and limited corpus, face difficulties when deployed online due to three primary factors tied to this limitation. First, they might confront queries featuring unexpected linguistic patterns or slot values (i.e., unseen user behaviors). Second, they could potentially face requirements for new functions or tasks (i.e., task definition extensions). Third, even when equipped with relevant knowledge, these bots may produce responses that appear plausible but are actually incorrect (i.e., "hallucinations"). Addressing these challenges is vital for enhancing task bots' performance and reliability in real-world settings.
Beyond the Black Box: Integrating Lexical and Semantic Methods in Quantitative Discourse Analysis with BERTopic
Quantitative Discourse Analysis has seen growing adoption with the rise of Large Language Models and computational tools. However, reliance on black box software such as MAXQDA and NVivo risks undermining methodological transparency and alignment with research goals. This paper presents a hybrid, transparent framework for QDA that combines lexical and semantic methods to enable triangulation, reproducibility, and interpretability. Drawing from a case study in historical political discourse, we demonstrate how custom Python pipelines using NLTK, spaCy, and Sentence Transformers allow fine-grained control over preprocessing, lemmatisation, and embedding generation. We further detail our iterative BERTopic modelling process, incorporating UMAP dimensionality reduction, HDBSCAN clustering, and c-TF-IDF keyword extraction, optimised through parameter tuning and multiple runs to enhance topic coherence and coverage. By juxtaposing precise lexical searches with context-aware semantic clustering, we argue for a multi-layered approach that mitigates the limitations of either method in isolation. Our workflow underscores the importance of code-level transparency, researcher agency, and methodological triangulation in computational discourse studies. Code and supplementary materials are available via GitHub.
EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Spoken Dialogue Systems
Liu, Jingwen, Cheng, Kan Jen, Lian, Jiachen, Anand, Akshay, Jain, Rishi, Qiao, Faith, Netzorg, Robin, Chou, Huang-Cheng, Li, Tingle, Lin, Guan-Ting, Anumanchipalli, Gopala
--Speech emotions play a crucial role in human-computer interaction, shaping engagement and context-aware communication. Despite recent advances in spoken dialogue systems, a holistic system for evaluating emotional reasoning is still lacking. T o address this, we introduce EMO-Reasoning, a benchmark for assessing emotional coherence in dialogue systems. It leverages a curated dataset generated via text-to-speech to simulate diverse emotional states, overcoming the scarcity of emotional speech data. We further propose the Cross-turn Emotion Reasoning Score to assess the emotion transitions in multi-turn dialogues. Evaluating seven dialogue systems through continuous, categorical, and perceptual metrics, we show that our framework effectively detects emotional inconsistencies, providing insights for improving current dialogue systems. By releasing a systematic evaluation benchmark, we aim to advance emotion-aware spoken dialogue modeling toward more natural and adaptive interactions. Spoken communication relies on more than just the words we use.
Why Synthetic Isn't Real Yet: A Diagnostic Framework for Contact Center Dialogue Generation
Devanathan, Rishikesh, Nathan, Varun, Kumar, Ayush
Synthetic transcript generation is critical in contact center domains, where privacy and data scarcity limit model training and evaluation. Unlike prior synthetic dialogue generation work on open-domain or medical dialogues, contact center conversations are goal-oriented, role-asymmetric, and behaviorally complex, featuring disfluencies, ASR noise, and compliance-driven agent actions. In deployments where transcripts are unavailable, standard pipelines still yield derived call attributes such as Intent Summaries, Topic Flow, and QA Evaluation Forms. We leverage these as supervision signals to guide generation. To assess the quality of such outputs, we introduce a diagnostic framework of 18 linguistically and behaviorally grounded metrics for comparing real and synthetic transcripts. We benchmark four language-agnostic generation strategies, from simple prompting to characteristic-aware multi-stage approaches, alongside reference-free baselines. Results reveal persistent challenges: no method excels across all traits, with notable deficits in disfluency, sentiment, and behavioral realism. Our diagnostic tool exposes these gaps, enabling fine-grained evaluation and stress testing of synthetic dialogue across languages.
SentiMM: A Multimodal Multi-Agent Framework for Sentiment Analysis in Social Media
Xu, Xilai, Zhao, Zilin, Song, Chengye, Wang, Zining, Qiang, Jinhe, Yan, Jiongrui, Lin, Yuhuai
With the increasing prevalence of multimodal content on social media, sentiment analysis faces significant challenges in effectively processing heterogeneous data and recognizing multi-label emotions. Existing methods often lack effective cross-modal fusion and external knowledge integration. We propose SentiMM, a novel multi-agent framework designed to systematically address these challenges. SentiMM processes text and visual inputs through specialized agents, fuses multimodal features, enriches context via knowledge retrieval, and aggregates results for final sentiment classification. We also introduce SentiMMD, a large-scale multimodal dataset with seven fine-grained sentiment categories. Extensive experiments demonstrate that SentiMM achieves superior performance compared to state-of-the-art baselines, validating the effectiveness of our structured approach.
EduRABSA: An Education Review Dataset for Aspect-based Sentiment Analysis Tasks
Hua, Yan Cathy, Denny, Paul, Wicker, Jörg, Taskova, Katerina
Every year, most educational institutions seek and receive an enormous volume of text feedback from students on courses, teaching, and overall experience. Yet, turning this raw feedback into useful insights is far from straightforward. It has been a long-standing challenge to adopt automatic opinion mining solutions for such education review text data due to the content complexity and low-granularity reporting requirements. Aspect-based Sentiment Analysis (ABSA) offers a promising solution with its rich, sub-sentence-level opinion mining capabilities. However, existing ABSA research and resources are very heavily focused on the commercial domain. In education, they are scarce and hard to develop due to limited public datasets and strict data protection. A high-quality, annotated dataset is urgently needed to advance research in this under-resourced area. In this work, we present EduRABSA (Education Review ABSA), the first public, annotated ABSA education review dataset that covers three review subject types (course, teaching staff, university) in the English language and all main ABSA tasks, including the under-explored implicit aspect and implicit opinion extraction. We also share ASQE-DPT (Data Processing Tool), an offline, lightweight, installation-free manual data annotation tool that generates labelled datasets for comprehensive ABSA tasks from a single-task annotation. Together, these resources contribute to the ABSA community and education domain by removing the dataset barrier, supporting research transparency and reproducibility, and enabling the creation and sharing of further resources. The dataset, annotation tool, and scripts and statistics for dataset processing and sampling are available at https://github.com/yhua219/edurabsa_dataset_and_annotation_tool.
PGF-Net: A Progressive Gated-Fusion Framework for Efficient Multimodal Sentiment Analysis
We introduce PGF-Net (Progressive Gated-Fusion Network), a novel deep learning framework designed for efficient and interpretable multimodal sentiment analysis. Our framework incorporates three primary innovations. Firstly, we propose a Progressive Intra-Layer Fusion paradigm, where a Cross-Attention mechanism empowers the textual representation to dynamically query and integrate non-linguistic features from audio and visual streams within the deep layers of a Transformer encoder. This enables a deeper, context-dependent fusion process. Secondly, the model incorporates an Adaptive Gated Arbitration mechanism, which acts as a dynamic controller to balance the original linguistic information against the newly fused multimodal context, ensuring stable and meaningful integration while preventing noise from overwhelming the signal. Lastly, a hybrid Parameter-Efficient Fine-Tuning (PEFT) strategy is employed, synergistically combining global adaptation via LoRA with local refinement through Post-Fusion Adapters. This significantly reduces trainable parameters, making the model lightweight and suitable for resource-limited scenarios. These innovations are integrated into a hierarchical encoder architecture, enabling PGF-Net to perform deep, dynamic, and interpretable multimodal sentiment analysis while maintaining exceptional parameter efficiency. Experimental results on MOSI dataset demonstrate that our proposed PGF-Net achieves state-of-the-art performance, with a Mean Absolute Error (MAE) of 0.691 and an F1-Score of 86.9%. Notably, our model achieves these results with only 3.09M trainable parameters, showcasing a superior balance between performance and computational efficiency.
Do Language Models Agree with Human Perceptions of Suspense in Stories?
Matlin, Glenn, Zhang, Devin, Loza, Rodrigo Barroso, Popescu, Diana M., Isbell, Joni, Chakraborty, Chandreyi, Riedl, Mark
Suspense is an affective response to narrative text that is believed to involve complex cognitive processes in humans. Several psychological models have been developed to describe this phenomenon and the circumstances under which text might trigger it. We replicate four seminal psychological studies of human perceptions of suspense, substituting human responses with those of different open-weight and closed-source LMs. We conclude that while LMs can distinguish whether a text is intended to induce suspense in people, LMs cannot accurately estimate the relative amount of suspense within a text sequence as compared to human judgments, nor can LMs properly capture the human perception for the rise and fall of suspense across multiple text segments. We probe the abilities of LM suspense understanding by adversarially permuting the story text to identify what cause human and LM perceptions of suspense to diverge. We conclude that, while LMs can superficially identify and track certain facets of suspense, they do not process suspense in the same way as human readers.
PyTOD: Programmable Task-Oriented Dialogue with Execution Feedback
Coca, Alexandru, Tseng, Bo-Hsiang, Boothroyd, Pete, Cheng, Jianpeng, Gaynor, Mark, Zhang, Zhenxing, Stacey, Joe, Guigue, Tristan, Alonso, Héctor Martinez, Séaghdha, Diarmuid Ó, Johannsen, Anders
Programmable task-oriented dialogue (TOD) agents enable language models to follow structured dialogue policies, but their effectiveness hinges on accurate state tracking. We present PyTOD, an agent that generates executable code to track dialogue state and uses policy and execution feedback for efficient error correction. To this end, PyTOD employs a simple constrained decoding approach, using a language model instead of grammar rules to follow API schemata. This leads to state-of-the-art state tracking performance on the challenging SGD benchmark. Our experiments show that PyTOD surpasses strong baselines in both accuracy and robust user goal estimation as the dialogue progresses, demonstrating the effectiveness of execution-aware state tracking.