Discourse & Dialogue
Tailored Conversations beyond LLMs: A RL-Based Dialogue Manager
Galland, Lucie, Pelachaud, Catherine, Pecune, Florian
In this work, we propose a novel framework that integrates large language models (LLMs) with an RL-based dialogue manager for open-ended dialogue with a specific goal. By leveraging hierarchical reinforcement learning to model the structured phases of dialogue and employ meta-learning to enhance adaptability across diverse user profiles, our approach enhances adaptability and efficiency, enabling the system to learn from limited data, transition fluidly between dialogue phases, and personalize responses to heterogeneous patient needs. We apply our framework to Motivational Interviews, aiming to foster behavior change, and demonstrate that the proposed dialogue manager outperforms a state-of-the-art LLM baseline in terms of reward, showing a potential benefit of conditioning LLMs to create open-ended dialogue systems with specific goals.
Towards Explainable Fusion and Balanced Learning in Multimodal Sentiment Analysis
Luo, Miaosen, Jiang, Yuncheng, Mai, Sijie
Multimodal Sentiment Analysis (MSA) faces two critical challenges: the lack of interpretability in the decision logic of multimodal fusion and modality imbalance caused by disparities in inter-modal information density. To address these issues, we propose KAN-MCP, a novel framework that integrates the interpretability of Kolmogorov-Arnold Networks (KAN) with the robustness of the Multimodal Clean Pareto (MCPareto) framework. First, KAN leverages its univariate function decomposition to achieve transparent analysis of cross-modal interactions. This structural design allows direct inspection of feature transformations without relying on external interpretation tools, thereby ensuring both high expressiveness and interpretability. Second, the proposed MCPareto enhances robustness by addressing modality imbalance and noise interference. Specifically, we introduce the Dimensionality Reduction and Denoising Modal Information Bottleneck (DRD-MIB) method, which jointly denoises and reduces feature dimensionality. This approach provides KAN with discriminative low-dimensional inputs to reduce the modeling complexity of KAN while preserving critical sentiment-related information. Furthermore, MCPareto dynamically balances gradient contributions across modalities using the purified features output by DRD-MIB, ensuring lossless transmission of auxiliary signals and effectively alleviating modality imbalance. This synergy of interpretability and robustness not only achieves superior performance on benchmark datasets such as CMU-MOSI, CMU-MOSEI, and CH-SIMS v2 but also offers an intuitive visualization interface through KAN's interpretable architecture. Our code is released on https://github.com/LuoMSen/KAN-MCP.
VaxPulse: Monitoring of Online Public Concerns to Enhance Post-licensure Vaccine Surveillance
Javed, Muhammad, Khademi, Sedigh, Hickman, Joanne, Buttery, Jim, Clothier, Hazel, Dimaguila, Gerardo Luis
The recent vaccine-related infodemic has amplified public concerns, highlighting the need for proactive misinformation management. We describe how we enhanced the reporting surveillance system of Victoria's vaccine safety service, SAEFVIC, through the incorporation of new information sources for public sentiment analysis, topics of discussion, and hesitancies about vaccinations online. Using VaxPulse, a multi-step framework, we integrate adverse events following immunisation (AEFI) with sentiment analysis, demonstrating the importance of contextualising public concerns. Additionally, we emphasise the need to address non-English languages to stratify concerns across ethno-lingual communities, providing valuable insights for vaccine uptake strategies and combating mis/disinformation. The framework is applied to real-world examples and a case study on women's vaccine hesitancy, showcasing its benefits and adaptability by identifying public opinion from online media.
Dual Modality-Aware Gated Prompt Tuning for Few-Shot Multimodal Sarcasm Detection
Jana, Soumyadeep, Kundu, Abhrajyoti, Singh, Sanasam Ranbir
The widespread use of multimodal content on social media has heightened the need for effective sarcasm detection to improve opinion mining. However, existing models rely heavily on large annotated datasets, making them less suitable for real-world scenarios where labeled data is scarce. This motivates the need to explore the problem in a few-shot setting. To this end, we introduce DMDP (Deep Modality-Disentangled Prompt Tuning), a novel framework for few-shot multimodal sarcasm detection. Unlike prior methods that use shallow, unified prompts across modalities, DMDP employs gated, modality-specific deep prompts for text and visual encoders. These prompts are injected across multiple layers to enable hierarchical feature learning and better capture diverse sarcasm types. To enhance intra-modal learning, we incorporate a prompt-sharing mechanism across layers, allowing the model to aggregate both low-level and high-level semantic cues. Additionally, a cross-modal prompt alignment module enables nuanced interactions between image and text representations, improving the model's ability to detect subtle sarcastic intent. Experiments on two public datasets demonstrate DMDP's superior performance in both few-shot and extremely low-resource settings. Further cross-dataset evaluations show that DMDP generalizes well across domains, consistently outperforming baseline methods.
Backtesting Sentiment Signals for Trading: Evaluating the Viability of Alpha Generation from Sentiment Analysis
Pontes, Elvys Linhares, Gonzรกlez-Gallardo, Carlos-Emiliano, Bordea, Georgeta, Moreno, Josรฉ G., Jannet, Mohamed Ben, Zhao, Yuxuan, Doucet, Antoine
Sentiment analysis, widely used in product reviews, also impacts financial markets by influencing asset prices through microblogs and news articles. Despite research in sentiment-driven finance, many studies focus on sentence-level classification, overlooking its practical application in trading. This study bridges that gap by evaluating sentiment-based trading strategies for generating positive alpha. We conduct a backtesting analysis using sentiment predictions from three models (two classification and one regression) applied to news articles on Dow Jones 30 stocks, comparing them to the benchmark Buy&Hold strategy. Results show all models produced positive returns, with the regression model achieving the highest return of 50.63% over 28 months, outperforming the benchmark Buy&Hold strategy. This highlights the potential of sentiment in enhancing investment strategies and financial decision-making.
ProxAnn: Use-Oriented Evaluations of Topic Models and Document Clustering
Hoyle, Alexander, Calvo-Bartolomรฉ, Lorena, Boyd-Graber, Jordan, Resnik, Philip
Topic model and document-clustering evaluations either use automated metrics that align poorly with human preferences or require expert labels that are intractable to scale. We design a scalable human evaluation protocol and a corresponding automated approximation that reflect practitioners' real-world usage of models. Annotators -- or an LLM-based proxy -- review text items assigned to a topic or cluster, infer a category for the group, then apply that category to other documents. Using this protocol, we collect extensive crowdworker annotations of outputs from a diverse set of topic models on two datasets. We then use these annotations to validate automated proxies, finding that the best LLM proxies are statistically indistinguishable from a human annotator and can therefore serve as a reasonable substitute in automated evaluations. Package, web interface, and data are at https://github.com/ahoho/proxann
NEU-ESC: A Comprehensive Vietnamese dataset for Educational Sentiment analysis and topic Classification toward multitask learning
Mai, Phan Quoc Hung, Nguyen, Quang Hung, Duong, Phuong Giang, Nguyen, Hong Hanh, Long, Nguyen Tuan
In the field of education, understanding students' opinions through their comments is crucial, especially in the Vietnamese language, where resources remain limited. Existing educational datasets often lack domain relevance and student slang. To address these gaps, we introduce NEU-ESC, a new Vietnamese dataset for Educational Sentiment Classification and Topic Classification, curated from university forums, which offers more samples, richer class diversity, longer texts, and broader vocabulary. In addition, we explore multitask learning using encoder-only language models (BERT), in which we showed that it achieves performance up to 83.7% and 79.8% accuracy for sentiment and topic classification tasks. We also benchmark our dataset and model with other datasets and models, including Large Language Models, and discuss these benchmarks. The dataset is publicly available at: https://huggingface.co/datasets/hung20gg/NEU-ESC.
Aligning Spoken Dialogue Models from User Interactions
Wu, Anne, Mazarรฉ, Laurent, Zeghidour, Neil, Dรฉfossez, Alexandre
We propose a novel preference alignment framework for improving spoken dialogue models on real-time conversations from user interactions. Current preference learning methods primarily focus on text-based language models, and are not directly suited to the complexities of real-time speech interactions, with richer dynamics (e.g. interruption, interjection) and no explicit segmentation between speaker turns.We create a large-scale dataset of more than 150,000 preference pairs from raw multi-turn speech conversations, annotated with AI feedback, to cover preferences over both linguistic content and temporal context variations. We leverage offline alignment methods to finetune a full-duplex autoregressive speech-to-speech model. Extensive experiments demonstrate that feedback on generic conversations can be consistently effective in improving spoken dialogue models to produce more factual, safer and more contextually aligned interactions. We deploy the finetuned model and conduct holistic human evaluations to assess the impact beyond single-turn conversations. Our findings shed light on the importance of a well-calibrated balance among various dynamics, crucial for natural real-time speech dialogue systems.
AI-Driven Sentiment Analytics: Unlocking Business Value in the E-Commerce Landscape
Wu, Qianye, Xia, Chengxuan, Tian, Sixuan
The rapid growth of e-commerce has led to an overwhelming volume of customer feedback, from product reviews to service interactions. Extracting meaningful insights from this data is crucial for businesses aiming to improve customer satisfaction and optimize decision-making. This paper presents an AI-driven sentiment analysis system designed specifically for e-commerce applications, balancing accuracy with interpretability. Our approach integrates traditional machine learning techniques with modern deep learning models, allowing for a more nuanced understanding of customer sentiment while ensuring transparency in decision-making. Experimental results show that our system outperforms standard sentiment analysis methods, achieving an accuracy of 89.7% on diverse, large-scale datasets. Beyond technical performance, real-world implementation across multiple e-commerce platforms demonstrates tangible improvements in customer engagement and operational efficiency. This study highlights both the potential and the challenges of applying AI to sentiment analysis in a commercial setting, offering insights into practical deployment strategies and areas for future refinement.
ETS: Open Vocabulary Electroencephalography-To-Text Decoding and Sentiment Classification
Masry, Mohamed, Amen, Mohamed, Elzyat, Mohamed, Hamed, Mohamed, Magdy, Norhan, Khaled, Maram
Decoding natural language from brain activity using non-invasive electroencephalography (EEG) remains a significant challenge in neuroscience and machine learning, particularly for open-vocabulary scenarios where traditional methods struggle with noise and variability. Previous studies have achieved high accuracy on small-closed vocabularies, but it still struggles on open vocabularies. In this study, we propose ETS, a framework that integrates EEG with synchronized eye-tracking data to address two critical tasks: (1) open-vocabulary text generation and (2) sentiment classification of perceived language. Our model achieves a superior performance on BLEU and Rouge score for EEG-To-Text decoding and up to 10% F1 score on EEG-based ternary sentiment classification, which significantly outperforms supervised baselines. Furthermore, we show that our proposed model can handle data from various subjects and sources, showing great potential for high performance open vocabulary eeg-to-text system.