AITopics | Discourse & Dialogue

Collaborating Authors

Discourse & Dialogue

Understanding Language in Conversations "The problems addressed in discourse research aim to answer two general kinds of questions: (1) what information is contained in extended sequences of utterances that goes beyond the meaning of the individual utterances themselves? (2) how does the context in which an utterance is used affect the meaning of the individual utterances, or parts of them?"
– Barbara Grosz. Overview of Chapter 6: Discourse and Dialogue, Survey of the State of the Art in Human Language Technology (1996).

News Overviews Instructional Materials AI-Alerts Classics

Aligning Spoken Dialogue Models from User Interactions

Wu, Anne, Mazaré, Laurent, Zeghidour, Neil, Défossez, Alexandre

arXiv.org Artificial IntelligenceJun-27-2025

We propose a novel preference alignment framework for improving spoken dialogue models on real-time conversations from user interactions. Current preference learning methods primarily focus on text-based language models, and are not directly suited to the complexities of real-time speech interactions, with richer dynamics (e.g. interruption, interjection) and no explicit segmentation between speaker turns.We create a large-scale dataset of more than 150,000 preference pairs from raw multi-turn speech conversations, annotated with AI feedback, to cover preferences over both linguistic content and temporal context variations. We leverage offline alignment methods to finetune a full-duplex autoregressive speech-to-speech model. Extensive experiments demonstrate that feedback on generic conversations can be consistently effective in improving spoken dialogue models to produce more factual, safer and more contextually aligned interactions. We deploy the finetuned model and conduct holistic human evaluations to assess the impact beyond single-turn conversations. Our findings shed light on the importance of a well-calibrated balance among various dynamics, crucial for natural real-time speech dialogue systems.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2506.21463

Country:

Oceania > Australia (0.04)
North America > Canada (0.04)
Asia (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Media (0.68)
Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.93)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.68)

Add feedback

AI-Driven Sentiment Analytics: Unlocking Business Value in the E-Commerce Landscape

Wu, Qianye, Xia, Chengxuan, Tian, Sixuan

arXiv.org Artificial IntelligenceJun-27-2025

The rapid growth of e-commerce has led to an overwhelming volume of customer feedback, from product reviews to service interactions. Extracting meaningful insights from this data is crucial for businesses aiming to improve customer satisfaction and optimize decision-making. This paper presents an AI-driven sentiment analysis system designed specifically for e-commerce applications, balancing accuracy with interpretability. Our approach integrates traditional machine learning techniques with modern deep learning models, allowing for a more nuanced understanding of customer sentiment while ensuring transparency in decision-making. Experimental results show that our system outperforms standard sentiment analysis methods, achieving an accuracy of 89.7% on diverse, large-scale datasets. Beyond technical performance, real-world implementation across multiple e-commerce platforms demonstrates tangible improvements in customer engagement and operational efficiency. This study highlights both the potential and the challenges of applying AI to sentiment analysis in a commercial setting, offering insights into practical deployment strategies and areas for future refinement.

machine learning, natural language, sentiment analysis, (17 more...)

arXiv.org Artificial Intelligence

2504.08738

Country: North America > United States > California > Santa Cruz County > Santa Cruz (0.14)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology > Services > e-Commerce Services (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ETS: Open Vocabulary Electroencephalography-To-Text Decoding and Sentiment Classification

Masry, Mohamed, Amen, Mohamed, Elzyat, Mohamed, Hamed, Mohamed, Magdy, Norhan, Khaled, Maram

arXiv.org Artificial IntelligenceJun-19-2025

Decoding natural language from brain activity using non-invasive electroencephalography (EEG) remains a significant challenge in neuroscience and machine learning, particularly for open-vocabulary scenarios where traditional methods struggle with noise and variability. Previous studies have achieved high accuracy on small-closed vocabularies, but it still struggles on open vocabularies. In this study, we propose ETS, a framework that integrates EEG with synchronized eye-tracking data to address two critical tasks: (1) open-vocabulary text generation and (2) sentiment classification of perceived language. Our model achieves a superior performance on BLEU and Rouge score for EEG-To-Text decoding and up to 10% F1 score on EEG-based ternary sentiment classification, which significantly outperforms supervised baselines. Furthermore, we show that our proposed model can handle data from various subjects and sources, showing great potential for high performance open vocabulary eeg-to-text system.

machine learning, natural language, text classification, (17 more...)

arXiv.org Artificial Intelligence

2506.14783

Country: North America > United States > California (0.28)

Genre: Research Report (0.85)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.69)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.61)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.84)

Add feedback

Detecting Narrative Shifts through Persistent Structures: A Topological Analysis of Media Discourse

Bailey, Mark M., Heiligman, Mark I.

arXiv.org Artificial IntelligenceJun-19-2025

How can we detect when global events fundamentally reshape public discourse? This study introduces a topological framework for identifying structural change in media narratives using persistent homology. Drawing on international news articles surrounding major events - including the Russian invasion of Ukraine (Feb 2022), the murder of George Floyd (May 2020), the U.S. Capitol insurrection (Jan 2021), and the Hamas-led invasion of Israel (Oct 2023) - we construct daily co-occurrence graphs of noun phrases to trace evolving discourse. Each graph is embedded and transformed into a persistence diagram via a Vietoris-Rips filtration. We then compute Wasserstein distances and persistence entropies across homological dimensions to capture semantic disruption and narrative volatility over time. Our results show that major geopolitical and social events align with sharp spikes in both H0 (connected components) and H1 (loops), indicating sudden reorganization in narrative structure and coherence. Cross-correlation analyses reveal a typical lag pattern in which changes to component-level structure (H0) precede higher-order motif shifts (H1), suggesting a bottom-up cascade of semantic change. An exception occurs during the Russian invasion of Ukraine, where H1 entropy leads H0, possibly reflecting top-down narrative framing before local discourse adjusts. Persistence entropy further distinguishes tightly focused from diffuse narrative regimes. These findings demonstrate that persistent homology offers a mathematically principled, unsupervised method for detecting inflection points and directional shifts in public attention - without requiring prior knowledge of specific events. This topological approach advances computational social science by enabling real-time detection of semantic restructuring during crises, protests, and information shocks.

data mining, machine learning, topological analysis, (16 more...)

arXiv.org Artificial Intelligence

2506.14836

Country:

North America > United States (1.00)
Asia > Middle East > Israel (0.24)

Genre: Research Report > New Finding (1.00)

Industry:

Media > News (1.00)
Government > Regional Government > North America Government > United States Government (0.48)
Government > Regional Government > Europe Government > Russia Government (0.45)
Government > Regional Government > Asia Government > Russia Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.94)
(2 more...)

Add feedback

EmoNews: A Spoken Dialogue System for Expressive News Conversations

Matsuura, Ryuki, Bharadwaj, Shikhar, Liu, Jiarui, Govindarajan, Dhatchi Kunde

arXiv.org Artificial IntelligenceJun-18-2025

We develop a task-oriented spoken dialogue system (SDS) that regulates emotional speech based on contextual cues to enable more empathetic news conversations. Despite advancements in emotional text-to-speech (TTS) techniques, task-oriented emotional SDSs remain underexplored due to the compartmentalized nature of SDS and emotional TTS research, as well as the lack of standardized evaluation metrics for social goals. We address these challenges by developing an emotional SDS for news conversations that utilizes a large language model (LLM)-based sentiment analyzer to identify appropriate emotions and PromptTTS to synthesize context-appropriate emotional speech. We also propose subjective evaluation scale for emotional SDSs and judge the emotion regulation performance of the proposed and baseline systems. Experiments showed that our emotional SDS outperformed a baseline system in terms of the emotion regulation and engagement. These results suggest the critical role of speech emotion for more engaging conversations. All our source code is open-sourced at https://github.com/dhatchi711/espnet-emotional-news/tree/emo-sds/egs2/emo_news_sds/sds1

artificial intelligence, natural language, proceedings, (13 more...)

arXiv.org Artificial Intelligence

2506.13894

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
(3 more...)

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Characterizing Linguistic Shifts in Croatian News via Diachronic Word Embeddings

Dukić, David, Barić, Ana, Čuljak, Marko, Jukić, Josip, Tutek, Martin

arXiv.org Artificial IntelligenceJun-17-2025

Measuring how semantics of words change over time improves our understanding of how cultures and perspectives change. Diachronic word embeddings help us quantify this shift, although previous studies leveraged substantial temporally annotated corpora. In this work, we use a corpus of 9.5 million Croatian news articles spanning the past 25 years and quantify semantic change using skip-gram word embeddings trained on five-year periods. Our analysis finds that word embeddings capture linguistic shifts of terms pertaining to major topics in this timespan (COVID-19, Croatia joining the European Union, technological advancements). We also find evidence that embeddings from post-2020 encode increased positivity in sentiment analysis tasks, contrasting studies reporting a decline in mental health over the same period.

computational linguistic, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.13569

Country: Europe > Croatia (0.49)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.68)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.68)

Add feedback

A Survey of Automatic Evaluation Methods on Text, Visual and Speech Generations

Lan, Tian, Zhou, Yang-Hao, Ma, Zi-Ao, Sun, Fanshu, Sun, Rui-Qing, Luo, Junyu, Tu, Rong-Cheng, Huang, Heyan, Xu, Chen, Wu, Zhijing, Mao, Xian-Ling

arXiv.org Artificial IntelligenceJun-13-2025

Recent advances in deep learning have significantly enhanced generative AI capabilities across text, images, and audio. However, automatically evaluating the quality of these generated outputs presents ongoing challenges. Although numerous automatic evaluation methods exist, current research lacks a systematic framework that comprehensively organizes these methods across text, visual, and audio modalities. To address this issue, we present a comprehensive review and a unified taxonomy of automatic evaluation methods for generated content across all three modalities; We identify five fundamental paradigms that characterize existing evaluation approaches across these domains. Our analysis begins by examining evaluation methods for text generation, where techniques are most mature. We then extend this framework to image and audio generation, demonstrating its broad applicability. Finally, we discuss promising directions for future research in cross-modal evaluation methodologies.

generation negative sampling random sampling, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2506.10019

Country:

Europe (1.00)
Asia > China (0.68)
Asia > Middle East (0.67)
(2 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.45)

Industry:

Leisure & Entertainment (1.00)
Media > Music (0.67)
Information Technology > Security & Privacy (0.67)
Education > Assessment & Standards > Student Performance (0.45)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(6 more...)

Add feedback

What Users Value and Critique: Large-Scale Analysis of User Feedback on AI-Powered Mobile Apps

Chhetri, Vinaik, Upadhyay, Krishna, Siddique, A. B., Farooq, Umar

arXiv.org Artificial IntelligenceJun-13-2025

Artificial Intelligence (AI)-powered features have rapidly proliferated across mobile apps in various domains, including productivity, education, entertainment, and creativity. However, how users perceive, evaluate, and critique these AI features remains largely unexplored, primarily due to the overwhelming volume of user feedback. In this work, we present the first comprehensive, large-scale study of user feedback on AI-powered mobile apps, leveraging a curated dataset of 292 AI-driven apps across 14 categories with 894K AI-specific reviews from Google Play. We develop and validate a multi-stage analysis pipeline that begins with a human-labeled benchmark and systematically evaluates large language models (LLMs) and prompting strategies. Each stage, including review classification, aspect-sentiment extraction, and clustering, is validated for accuracy and consistency. Our pipeline enables scalable, high-precision analysis of user feedback, extracting over one million aspect-sentiment pairs clustered into 18 positive and 15 negative user topics. Our analysis reveals that users consistently focus on a narrow set of themes: positive comments emphasize productivity, reliability, and personalized assistance, while negative feedback highlights technical failures (e.g., scanning and recognition), pricing concerns, and limitations in language support. Our pipeline surfaces both satisfaction with one feature and frustration with another within the same review. These fine-grained, co-occurring sentiments are often missed by traditional approaches that treat positive and negative feedback in isolation or rely on coarse-grained analysis. To this end, our approach provides a more faithful reflection of the real-world user experiences with AI-powered apps. Category-aware analysis further uncovers both universal drivers of satisfaction and domain-specific frustrations.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2506.10785

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Outside Knowledge Conversational Video (OKCV) Dataset -- Dialoguing over Videos

Reichman, Benjamin, Patsch, Constantin, Truxal, Jack, Jain, Atishay, Heck, Larry

arXiv.org Artificial IntelligenceJun-12-2025

In outside knowledge visual question answering (OK-VQA), the model must identify relevant visual information within an image and incorporate external knowledge to accurately respond to a question. Extending this task to a visually grounded dialogue setting based on videos, a conversational model must both recognize pertinent visual details over time and answer questions where the required information is not necessarily present in the visual information. Moreover, the context of the overall conversation must be considered for the subsequent dialogue. To explore this task, we introduce a dataset comprised of $2,017$ videos with $5,986$ human-annotated dialogues consisting of $40,954$ interleaved dialogue turns. While the dialogue context is visually grounded in specific video segments, the questions further require external knowledge that is not visually present. Thus, the model not only has to identify relevant video parts but also leverage external knowledge to converse within the dialogue. We further provide several baselines evaluated on our dataset and show future challenges associated with this task. The dataset is made publicly available here: https://github.com/c-patsch/OKCV.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2506.09953

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
(3 more...)

Add feedback

Approaching Dialogue State Tracking via Aligning Speech Encoders and LLMs

Sedláček, Šimon, Yusuf, Bolaji, Švec, Ján, Hegde, Pradyoth, Kesiraju, Santosh, Plchot, Oldřich, Černocký, Jan

arXiv.org Artificial IntelligenceJun-11-2025

In this work, we approach spoken Dialogue State Tracking (DST) by bridging the representation spaces of speech encoders and LLMs via a small connector module, with a focus on fully open-sourced and open-data components (WavLM-large, OLMo). We focus on ablating different aspects of such systems including full/LoRA adapter fine-tuning, the effect of agent turns in the dialogue history, as well as fuzzy matching-based output post-processing, which greatly improves performance of our systems on named entities in the dialogue slot values. We conduct our experiments on the SpokenWOZ dataset, and additionally utilize the Speech-Aware MultiWOZ dataset to augment our training data. Ultimately, our best-performing WavLM + connector + OLMo-1B aligned models achieve state of the art on the SpokenWOZ test set (34.66% JGA), and our system with Gemma-2-9B-instruct further surpasses this result, reaching 42.17% JGA on SpokenWOZ test.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2506.08633

Country: Europe > Czechia (0.47)

Genre: Research Report (0.64)

Industry: Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)

Add feedback