AITopics | Discourse & Dialogue

Collaborating Authors

Discourse & Dialogue

Understanding Language in Conversations "The problems addressed in discourse research aim to answer two general kinds of questions: (1) what information is contained in extended sequences of utterances that goes beyond the meaning of the individual utterances themselves? (2) how does the context in which an utterance is used affect the meaning of the individual utterances, or parts of them?"
– Barbara Grosz. Overview of Chapter 6: Discourse and Dialogue, Survey of the State of the Art in Human Language Technology (1996).

News Overviews Instructional Materials AI-Alerts Classics

Interruption Handling for Conversational Robots

Cao, Shiye, Moon, Jiwon, Mahmood, Amama, Antony, Victor Nikhil, Xiao, Ziang, Liu, Anqi, Huang, Chien-Ming

arXiv.org Artificial IntelligenceJan-2-2025

Interruptions, a fundamental component of human communication, can enhance the dynamism and effectiveness of conversations, but only when effectively managed by all parties involved. Despite advancements in robotic systems, state-of-the-art systems still have limited capabilities in handling user-initiated interruptions in real-time. Prior research has primarily focused on post hoc analysis of interruptions. To address this gap, we present a system that detects user-initiated interruptions and manages them in real-time based on the interrupter's intent (i.e., cooperative agreement, cooperative assistance, cooperative clarification, or disruptive interruption). The system was designed based on interaction patterns identified from human-human interaction data. We integrated our system into an LLM-powered social robot and validated its effectiveness through a timed decision-making task and a contentious discussion task with 21 participants. Our system successfully handled 93.69% (n=104/111) of user-initiated interruptions. We discuss our learnings and their implications for designing interruption-handling behaviors in conversational robots.

interruption, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.01568

Country: North America > United States (0.68)

Genre:

Questionnaire & Opinion Survey (0.93)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.46)

Add feedback

Incremental Dialogue Management: Survey, Discussion, and Implications for HRI

Kennington, Casey, Lison, Pierre, Schlangen, David

arXiv.org Artificial IntelligenceJan-1-2025

Efforts towards endowing robots with the ability to speak have benefited from recent advancements in NLP, in particular large language models. However, as powerful as current models have become, they still operate on sentence or multi-sentence level input, not on the word-by-word input that humans operate on, affecting the degree of responsiveness that they offer, which is critical in situations where humans interact with robots using speech. In this paper, we review the literature on interactive systems that operate incrementally (i.e., at the word level or below it). We motivate the need for incremental systems, survey incremental modeling of important aspects of dialogue like speech recognition and language generation. Primary focus is on the part of the system that makes decisions, known as the dialogue manager. We find that there is very little research on incremental dialogue management, offer some requirements for practical incremental dialogue management, and the implications of incremental dialogue for embodied, robotic platforms.

computational linguistic, dialogue system, proceedings, (15 more...)

arXiv.org Artificial Intelligence

2501.00953

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > France (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
(16 more...)

Genre:

Overview (1.00)
Instructional Material > Course Syllabus & Notes (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)

Add feedback

SLIDE: Integrating Speech Language Model with LLM for Spontaneous Spoken Dialogue Generation

Lu, Haitian, Cheng, Gaofeng, Luo, Liuping, Zhang, Leying, Qian, Yanmin, Zhang, Pengyuan

arXiv.org Artificial IntelligenceJan-1-2025

Recently, ``textless" speech language models (SLMs) based on speech units have made huge progress in generating naturalistic speech, including non-verbal vocalizations. However, the generated speech samples often lack semantic coherence. In this paper, we propose SLM and LLM Integration for spontaneous spoken Dialogue gEneration (SLIDE). Specifically, we first utilize an LLM to generate the textual content of spoken dialogue. Next, we convert the textual dialogues into phoneme sequences and use a two-tower transformer-based duration predictor to predict the duration of each phoneme. Finally, an SLM conditioned on the spoken phoneme sequences is used to vocalize the textual dialogue. Experimental results on the Fisher dataset demonstrate that our system can generate naturalistic spoken dialogue while maintaining high semantic coherence.

dialogue, phoneme sequence, semantic coherence, (11 more...)

arXiv.org Artificial Intelligence

2501.00805

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

DiffETM: Diffusion Process Enhanced Embedded Topic Model

Shao, Wei, Liu, Mingyang, Song, Linqi

arXiv.org Artificial IntelligenceJan-1-2025

The embedded topic model (ETM) is a widely used approach that assumes the sampled document-topic distribution conforms to the logistic normal distribution for easier optimization. However, this assumption oversimplifies the real document-topic distribution, limiting the model's performance. In response, we propose a novel method that introduces the diffusion process into the sampling process of document-topic distribution to overcome this limitation and maintain an easy optimization process. We validate our method through extensive experiments on two mainstream datasets, proving its effectiveness in improving topic modeling performance.

document-topic distribution, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.00862

Country: Asia > China (0.16)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.75)

Add feedback

The Text Classification Pipeline: Starting Shallow going Deeper

Siino, Marco, Tinnirello, Ilenia, La Cascia, Marco

arXiv.org Artificial IntelligenceDec-30-2024

Text Classification (TC) stands as a cornerstone within the realm of Natural Language Processing (NLP), particularly when viewed through the lens of computer science and engineering. The past decade has seen deep learning revolutionize TC, propelling advancements in text retrieval, categorization, information extraction, and summarization. The scholarly literature is rich with datasets, models, and evaluation criteria, with English being the predominant language of focus, despite studies involving Arabic, Chinese, Hindi, and others. The efficacy of TC models relies heavily on their ability to capture intricate textual relationships and nonlinear correlations, necessitating a comprehensive examination of the entire TC pipeline. This monograph provides an in-depth exploration of the TC pipeline, with a particular emphasis on evaluating the impact of each component on the overall performance of TC models. The pipeline includes state-of-the-art datasets, text preprocessing techniques, text representation methods, classification models, evaluation metrics, current results and future trends. Each chapter meticulously examines these stages, presenting technical innovations and significant recent findings. The work critically assesses various classification strategies, offering comparative analyses, examples, case studies, and experimental evaluations. These contributions extend beyond a typical survey, providing a detailed and insightful exploration of TC.

machine learning, natural language, text classification, (25 more...)

arXiv.org Artificial Intelligence

2501.00174

Country:

Europe (1.00)
Asia > Japan > Honshū (0.27)
North America > United States > California (0.27)

Genre:

Summary/Review (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
(4 more...)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Law (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
(11 more...)

Add feedback

Machine Learning for Sentiment Analysis of Imported Food in Trinidad and Tobago

Daniels, Cassandra, Khan, Koffka

arXiv.org Artificial IntelligenceDec-27-2024

This research investigates the performance of various machine learning algorithms (CNN, LSTM, VADER, and RoBERTa) for sentiment analysis of Twitter data related to imported food items in Trinidad and Tobago. The study addresses three primary research questions: the comparative accuracy and efficiency of the algorithms, the optimal configurations for each model, and the potential applications of the optimized models in a live system for monitoring public sentiment and its impact on the import bill. The dataset comprises tweets from 2018 to 2024, divided into imbalanced, balanced, and temporal subsets to assess the impact of data balancing and the COVID-19 pandemic on sentiment trends. Ten experiments were conducted to evaluate the models under various configurations. Results indicated that VADER outperformed the other models in both multi-class and binary sentiment classifications. The study highlights significant changes in sentiment trends pre- and post-COVID-19, with implications for import policies.

machine learning, natural language, sentiment, (19 more...)

arXiv.org Artificial Intelligence

2412.19781

Country:

North America > Trinidad and Tobago (0.62)
Europe > Sweden (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SILC-EFSA: Self-aware In-context Learning Correction for Entity-level Financial Sentiment Analysis

Zhu, Senbin, He, Chenyuan, Liu, Hongde, Dong, Pengcheng, Zhao, Hanjie, Yan, Yuchen, Jia, Yuxiang, Zan, Hongying, Peng, Min

arXiv.org Artificial IntelligenceDec-26-2024

In recent years, fine-grained sentiment analysis in finance has gained significant attention, but the scarcity of entity-level datasets remains a key challenge. To address this, we have constructed the largest English and Chinese financial entity-level sentiment analysis datasets to date. Building on this foundation, we propose a novel two-stage sentiment analysis approach called Self-aware In-context Learning Correction (SILC). The first stage involves fine-tuning a base large language model to generate pseudo-labeled data specific to our task. In the second stage, we train a correction model using a GNN-based example retriever, which is informed by the pseudo-labeled data. This two-stage strategy has allowed us to achieve state-of-the-art performance on the newly constructed datasets, advancing the field of financial sentiment analysis. In a case study, we demonstrate the enhanced practical utility of our data and methods in monitoring the cryptocurrency market. Our datasets and code are available at https://github.com/NLP-Bin/SILC-EFSA.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.1914

Country: Asia > China (0.68)

Genre: Research Report (1.00)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
(2 more...)

Add feedback

DLF: Disentangled-Language-Focused Multimodal Sentiment Analysis

Wang, Pan, Zhou, Qiang, Wu, Yawen, Chen, Tianlong, Hu, Jingtong

arXiv.org Artificial IntelligenceDec-26-2024

Multimodal Sentiment Analysis (MSA) leverages heterogeneous modalities, such as language, vision, and audio, to enhance the understanding of human sentiment. While existing models often focus on extracting shared information across modalities or directly fusing heterogeneous modalities, such approaches can introduce redundancy and conflicts due to equal treatment of all modalities and the mutual transfer of information between modality pairs. To address these issues, we propose a Disentangled-Language-Focused (DLF) multimodal representation learning framework, which incorporates a feature disentanglement module to separate modality-shared and modality-specific information. To further reduce redundancy and enhance language-targeted features, four geometric measures are introduced to refine the disentanglement process. A Language-Focused Attractor (LFA) is further developed to strengthen language representation by leveraging complementary modality-specific information through a language-guided cross-attention mechanism. The framework also employs hierarchical predictions to improve overall accuracy. Extensive experiments on two popular MSA datasets, CMU-MOSI and CMU-MOSEI, demonstrate the significant performance gains achieved by the proposed DLF framework. Comprehensive ablation studies further validate the effectiveness of the feature disentanglement module, language-focused attractor, and hierarchical predictions. Our code is available at https://github.com/pwang322/DLF.

artificial intelligence, modality, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.12225

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.63)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.63)

Add feedback

Bidirectional Topic Matching: Quantifying Thematic Overlap Between Corpora Through Topic Modelling

Adam, Raven, Kogler, Marie Lisa

arXiv.org Artificial IntelligenceDec-24-2024

This study introduces Bidirectional Topic Matching (BTM), a novel method for cross-corpus topic modeling that quantifies thematic overlap and divergence between corpora. BTM is a flexible framework that can incorporate various topic modeling approaches, including BERTopic, Top2Vec, and Latent Dirichlet Allocation (LDA). BTM employs a dual-model approach, training separate topic models for each corpus and applying them reciprocally to enable comprehensive cross-corpus comparisons. This methodology facilitates the identification of shared themes and unique topics, providing nuanced insights into thematic relationships. Validation against cosine similarity-based methods demonstrates the robustness of BTM, with strong agreement metrics and distinct advantages in handling outlier topics. A case study on climate news articles showcases BTM's utility, revealing significant thematic overlaps and distinctions between corpora focused on climate change and climate action. BTM's flexibility and precision make it a valuable tool for diverse applications, from political discourse analysis to interdisciplinary studies. By integrating shared and unique topic analyses, BTM offers a comprehensive framework for exploring thematic relationships, with potential extensions to multilingual and dynamic datasets. This work highlights BTM's methodological contributions and its capacity to advance discourse analysis across various domains.

bidirectional topic matching, corpora, native topic, (13 more...)

arXiv.org Artificial Intelligence

2412.18376

Country:

Europe > Germany (0.28)
Europe > Austria > Styria > Graz (0.04)
Europe > Poland (0.04)
(4 more...)

Genre: Research Report > Promising Solution (0.34)

Industry: Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.71)

Add feedback

An Overview and Discussion of the Suitability of Existing Speech Datasets to Train Machine Learning Models for Collective Problem Solving

Villuri, Gnaneswar, Doboli, Alex

arXiv.org Artificial IntelligenceDec-24-2024

This report characterized the suitability of existing datasets for devising new Machine Learning models, decision making methods, and analysis algorithms to improve Collaborative Problem Solving and then enumerated requirements for future datasets to be devised. Problem solving was assumed to be performed in teams of about three, four members, which talked to each other. A dataset consists of the speech recordings of such teams. The characterization methodology was based on metrics that capture cognitive, social, and emotional activities and situations. The report presented the analysis of a large group of datasets developed for Spoken Language Understanding, a research area with some similarity to Collaborative Problem Solving.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2412.18489

Country: North America > United States (0.93)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.67)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
(2 more...)

Add feedback