AITopics

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.95)

Neural Information Processing SystemsMay-27-2025, 03:56:59 GMT

SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words

Speech encompasses a wealth of information, including but not limited to content, paralinguistic, and environmental information.This comprehensive nature of speech significantly impacts communication and is crucial for human-computer interaction.Chat-Oriented Large Language Models (LLMs), known for their general-purpose assistance capabilities, have evolved to handle multi-modal inputs, including speech.Although these models can be adept at recognizing and analyzing speech, they often fall short of generating appropriate responses.We argue that this is due to the lack of principles on task definition and model development, which requires open-source datasets and metrics suitable for model evaluation.To bridge the gap, we present SD-Eval, a benchmark dataset aimed at multidimensional evaluation of spoken dialogue understanding and generation.SD-Eval focuses on paralinguistic and environmental information and includes 7,303 utterances, amounting to 8.76 hours of speech data. The data is aggregated from eight public datasets, representing four perspectives: emotion, accent, age, and background sound.To assess the SD-Eval benchmark dataset, we implement three different models and construct a training set following a process similar to that of SD-Eval. The training set contains 1,052.72 hours of speech data and 724.4k utterances. We also conduct a comprehensive evaluation using objective evaluation methods (e.g.

artificial intelligence, large language model, natural language, (7 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.63)

Neural Information Processing SystemsMay-27-2025, 03:43:29 GMT

Towards Robust Multimodal Sentiment Analysis with Incomplete Data

The field of Multimodal Sentiment Analysis (MSA) has recently witnessed an emerging direction seeking to tackle the issue of data incompleteness. Recognizing that the language modality typically contains dense sentiment information, we consider it as the dominant modality and present an innovative Language-dominated Noise-resistant Learning Network (LNLN) to achieve robust MSA. The proposed LNLN features a dominant modality correction (DMC) module and dominant modality based multimodal learning (DMML) module, which enhances the model's robustness across various noise scenarios by ensuring the quality of dominant modality representations. Aside from the methodical design, we perform comprehensive experiments under random data missing scenarios, utilizing diverse and meaningful settings on several popular datasets (e.g., MOSI, MOSEI, and SIMS), providing additional uniformity, transparency, and fairness compared to existing evaluations in the literature. Empirically, LNLN consistently outperforms existing baselines, demonstrating superior performance across these challenging and extensive evaluation metrics.

artificial intelligence, multimodal sentiment analysis, natural language, (3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.66)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.66)

Sapkota, Sagar, Hasan, Mohammad Saqib, Shah, Mubarak, Karmaker, Santu

Multi-Party Conversational Agents: A Survey

arXiv.org Artificial IntelligenceMay-27-2025

Multi-party Conversational Agents (MPCAs) are systems designed to engage in dialogue with more than two participants simultaneously. Unlike traditional two-party agents, designing MPCAs faces additional challenges due to the need to interpret both utterance semantics and social dynamics. This survey explores recent progress in MPCAs by addressing three key questions: 1) Can agents model each participants' mental states? (State of Mind Modeling); 2) Can they properly understand the dialogue content? (Semantic Understanding); and 3) Can they reason about and predict future conversation flow? (Agent Action Modeling). We review methods ranging from classical machine learning to Large Language Models (LLMs) and multi-modal systems. Our analysis underscores Theory of Mind (ToM) as essential for building intelligent MPCAs and highlights multi-modal understanding as a promising yet underexplored direction. Finally, this survey offers guidance to future researchers on developing more capable MPCAs.

computational linguistic, large language model, machine learning, (18 more...)

2505.18845

Country:

Europe (1.00)
Asia > Middle East > UAE (0.46)
North America > United States > New Mexico (0.28)
North America > United States > Minnesota (0.28)

Genre: Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.45)
Health & Medicine > Consumer Health (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Koudounas, Alkis, La Quatra, Moreno, Baralis, Elena

DeepDialogue: A Multi-Turn Emotionally-Rich Spoken Dialogue Dataset

arXiv.org Artificial IntelligenceMay-27-2025

Recent advances in conversational AI have demonstrated impressive capabilities in single-turn responses, yet multi-turn dialogues remain challenging for even the most sophisticated language models. Current dialogue datasets are limited in their emotional range, domain diversity, turn depth, and are predominantly text-only, hindering progress in developing more human-like conversational systems across modalities. To address these limitations, we present DeepDialogue, a large-scale multimodal dataset containing 40,150 high-quality multi-turn dialogues spanning 41 domains and incorporating 20 distinct emotions with coherent emotional progressions. Our approach pairs 9 different language models (4B-72B parameters) to generate 65,600 initial conversations, which we then evaluate through a combination of human annotation and LLM-based quality filtering. The resulting dataset reveals fundamental insights: smaller models fail to maintain coherence beyond 6 dialogue turns; concrete domains (e.g., "cars," "travel") yield more meaningful conversations than abstract ones (e.g., "philosophy"); and cross-model interactions produce more coherent dialogues than same-model conversations. A key contribution of DeepDialogue is its speech component, where we synthesize emotion-consistent voices for all 40,150 dialogues, creating the first large-scale open-source multimodal dialogue dataset that faithfully preserves emotional context across multi-turn conversations.

large language model, machine learning, natural language, (21 more...)

2505.19978

Country:

Europe (0.46)
Asia (0.28)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.92)
Health & Medicine > Consumer Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Neural Information Processing SystemsMay-26-2025, 20:43:29 GMT

Toward Robust Incomplete Multimodal Sentiment Analysis via Hierarchical Representation Learning

Multimodal Sentiment Analysis (MSA) is an important research area that aims to understand and recognize human sentiment through multiple modalities. The complementary information provided by multimodal fusion promotes better sentiment analysis compared to utilizing only a single modality. Nevertheless, in real-world applications, many unavoidable factors may lead to situations of uncertain modality missing, thus hindering the effectiveness of multimodal modeling and degrading the model's performance. To this end, we propose a Hierarchical Representation Learning Framework (HRLF) for the MSA task under uncertain missing modalities. Specifically, we propose a fine-grained representation factorization module that sufficiently extracts valuable sentiment information by factorizing modality into sentiment-relevant and modality-specific representations through crossmodal translation and sentiment semantic reconstruction.

artificial intelligence, multimodal sentiment analysis, natural language, (8 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.89)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.89)

arXiv.org Artificial IntelligenceMay-26-2025

Development and Validation of Engagement and Rapport Scales for Evaluating User Experience in Multimodal Dialogue Systems

Kurata, Fuma, Saeki, Mao, Eguchi, Masaki, Suzuki, Shungo, Takatsu, Hiroaki, Matsuyama, Yoichi

This study aimed to develop and validate two scales of engagement and rapport to evaluate the user experience quality with multimodal dialogue systems in the context of foreign language learning. The scales were designed based on theories of engagement in educational psychology, social psychology, and second language acquisition.Seventy-four Japanese learners of English completed roleplay and discussion tasks with trained human tutors and a dialog agent. After each dialogic task was completed, they responded to the scales of engagement and rapport. The validity and reliability of the scales were investigated through two analyses. We first conducted analysis of Cronbach's alpha coefficient and a series of confirmatory factor analyses to test the structural validity of the scales and the reliability of our designed items. We then compared the scores of engagement and rapport between the dialogue with human tutors and the one with a dialogue agent. The results revealed that our scales succeeded in capturing the difference in the dialogue experience quality between the human interlocutors and the dialogue agent from multiple perspectives.

artificial intelligence, machine learning, natural language, (15 more...)

2505.17075

Country: Asia > Japan (0.47)

Genre:

Research Report > Experimental Study (0.94)
Research Report > New Finding (0.70)

Industry: Education > Curriculum > Subject-Specific Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.88)

Ebrahimi, Seyedeh Fatemeh, Peltonen, Jaakko

Constrained Non-negative Matrix Factorization for Guided Topic Modeling of Minority Topics

arXiv.org Artificial IntelligenceMay-23-2025

Topic models often fail to capture low-prevalence, domain-critical themes, so-called minority topics, such as mental health themes in online comments. While some existing methods can incorporate domain knowledge, such as expected topical content, methods allowing guidance may require overly detailed expected topics, hindering the discovery of topic divisions and variation. We propose a topic modeling solution via a specially constrained NMF. We incorporate a seed word list characterizing minority content of interest, but we do not require experts to pre-specify their division across minority topics. Through prevalence constraints on minority topics and seed word content across topics, we learn distinct data-driven minority topics as well as majority topics. The constrained NMF is fitted via Karush-Kuhn-Tucker (KKT) conditions with multiplicative updates. We outperform several baselines on synthetic data in terms of topic purity, normalized mutual information, and also evaluate topic quality using Jensen-Shannon divergence (JSD). We conduct a case study on YouTube vlog comments, analyzing viewer discussion of mental health content; our model successfully identifies and reveals this domain-relevant minority content.

constraint, machine learning, natural language, (17 more...)

2505.16493

Country:

Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(17 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.69)
(2 more...)

Saifullah, KM Khalid, Azmain, Faiaz, Hye, Habiba

Sentiment Analysis in Software Engineering: Evaluating Generative Pre-trained Transformers

arXiv.org Artificial IntelligenceMay-22-2025

Abstract--Sentiment analysis plays a crucial role in understanding developer interactions, issue resolutions, and p roject dynamics within software engineering (SE). While traditio nal SE-specific sentiment analysis tools have made significant s trides, they often fail to account for the nuanced and context-depen dent language inherent to the domain. This study systematically evaluates the performance of bidirectional transformers, such as BERT, against generative pre-trained transformers, speci fically GPT -4o-mini, in SE sentiment analysis. Th e results reveal that fine-tuned GPT -4o-mini performs comparab le to BERT and other bidirectional models on structured and balan ced datasets like GitHub and Jira, achieving macro-averaged F1 - scores of 0.93 and 0.98, respectively. However, on linguist ically complex datasets with imbalanced sentiment distributions, such as Stack Overflow, the default GPT -4o-mini model exhibits superior generalization, achieving an accuracy of 85.3% co m-pared to the fine-tuned model's 13.1%. The study underscores the importance of aligning model architectures with dataset characterist ics to optimize performance and proposes directions for future re search in refining sentiment analysis tools tailored to the SE domai n. Sentiment analysis, a critical subfield of natural language processing (NLP), involves classifying text into sentimen t polarities, such as positive, neutral, and negative. It has been widely studied across various domains, including software engineering (SE), where analyzing sentiments expressed in textual artifacts provides insights into developer intera ctions, issue resolution, and project dynamics.

large language model, machine learning, natural language, (18 more...)

2505.14692

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
(2 more...)

arXiv.org Artificial IntelligenceMay-21-2025

PL-FGSA: A Prompt Learning Framework for Fine-Grained Sentiment Analysis Based on MindSpore

Qin, Zhenkai, He, Jiajing, Fang, Qiao

Fine-grained sentiment analysis (FGSA) aims to identify sentiment polarity toward specific aspects within a text, enabling more precise opinion mining in domains such as product reviews and social media. However, traditional FGSA approaches often require task-specific architectures and extensive annotated data, limiting their generalization and scalability. To address these challenges, we propose PL-FGSA, a unified prompt learning-based framework implemented using the MindSpore platform, which integrates prompt design with a lightweight TextCNN backbone. Our method reformulates FGSA as a multi-task prompt-augmented generation problem, jointly tackling aspect extraction, sentiment classification, and causal explanation in a unified paradigm. By leveraging prompt-based guidance, PL-FGSA enhances interpretability and achieves strong performance under both full-data and low-resource conditions. Experiments on three benchmark datasets-SST-2, SemEval-2014 Task 4, and MAMS-demonstrate that our model consistently outperforms traditional fine-tuning methods and achieves F1-scores of 0.922, 0.694, and 0.597, respectively. These results validate the effectiveness of prompt-based generalization and highlight the practical value of PL-FGSA for real-world sentiment analysis tasks.

artificial intelligence, natural language, pl-fgsa, (14 more...)

2505.14165

Country: Asia > China > Guangxi Province (0.15)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)