AITopics | Discourse & Dialogue

Collaborating Authors

Discourse & Dialogue

Understanding Language in Conversations "The problems addressed in discourse research aim to answer two general kinds of questions: (1) what information is contained in extended sequences of utterances that goes beyond the meaning of the individual utterances themselves? (2) how does the context in which an utterance is used affect the meaning of the individual utterances, or parts of them?"
– Barbara Grosz. Overview of Chapter 6: Discourse and Dialogue, Survey of the State of the Art in Human Language Technology (1996).

News Overviews Instructional Materials AI-Alerts Classics

Towards Actionable Pedagogical Feedback: A Multi-Perspective Analysis of Mathematics Teaching and Tutoring Dialogue

Naim, Jannatun, Cao, Jie, Tasneem, Fareen, Jacobs, Jennifer, Milne, Brent, Martin, James, Sumner, Tamara

arXiv.org Artificial IntelligenceAug-5-2025

Effective feedback is essential for refining instructional practices in mathematics education, and researchers often turn to advanced natural language processing (NLP) models to analyze classroom dialogues from multiple perspectives. However, utterance-level discourse analysis encounters two primary challenges: (1) multifunctionality, where a single utterance may serve multiple purposes that a single tag cannot capture, and (2) the exclusion of many utterances from domain-specific discourse move classifications, leading to their omission in feedback. To address these challenges, we proposed a multi-perspective discourse analysis that integrates domain-specific talk moves with dialogue act (using the flattened multi-functional SWBD-MASL schema with 43 tags) and discourse relation (applying Segmented Discourse Representation Theory with 16 relations). Our top-down analysis framework enables a comprehensive understanding of utterances that contain talk moves, as well as utterances that do not contain talk moves. This is applied to two mathematics education datasets: TalkMoves (teaching) and SAGA22 (tutoring). Through distributional unigram analysis, sequential talk move analysis, and multi-view deep dive, we discovered meaningful discourse patterns, and revealed the vital role of utterances without talk moves, demonstrating that these utterances, far from being mere fillers, serve crucial functions in guiding, acknowledging, and structuring classroom discourse. These insights underscore the importance of incorporating discourse relations and dialogue acts into AI-assisted education systems to enhance feedback and create more responsive learning environments. Our framework may prove helpful for providing human educator feedback, but also aiding in the development of AI agents that can effectively emulate the roles of both educators and students.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2505.07161

Country:

Europe (1.00)
Asia > Middle East (0.67)
North America > United States > California (0.46)
North America > United States > Colorado > Boulder County > Boulder (0.14)

Genre:

Instructional Material (1.00)
Research Report > New Finding (0.67)

Industry:

Education > Educational Setting (1.00)
Education > Curriculum > Subject-Specific Education (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.93)
Education > Assessment & Standards (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Observing Dialogue in Therapy: Categorizing and Forecasting Behavioral Codes

Cao, Jie, Tanana, Michael, Imel, Zac E., Poitras, Eric, Atkins, David C., Srikumar, Vivek

arXiv.org Artificial IntelligenceAug-5-2025

Automatically analyzing dialogue can help understand and guide behavior in domains such as counseling, where interactions are largely mediated by conversation. In this paper, we study modeling behavioral codes used to asses a psychotherapy treatment style called Motivational Interviewing (MI), which is effective for addressing substance abuse and related problems. Specifically, we address the problem of providing real-time guidance to therapists with a dialogue observer that (1) categorizes therapist and client MI behavioral codes and, (2) forecasts codes for upcoming utterances to help guide the conversation and potentially alert the therapist. For both tasks, we define neural network models that build upon recent successes in dialogue modeling. Our experiments demonstrate that our models can outperform several baselines for both tasks. We also report the results of a careful analysis that reveals the impact of the various network design tradeoffs for modeling therapy dialogue.

machine learning, natural language, utterance, (20 more...)

arXiv.org Artificial Intelligence

1907.00326

Country: North America > United States (0.28)

Genre:

Research Report > Experimental Study (0.93)
Research Report > Strength High (0.68)
Research Report > New Finding (0.67)

Industry:

Health & Medicine > Consumer Health (0.88)
Health & Medicine > Pharmaceuticals & Biotechnology (0.66)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Addiction Disorder (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Holistic Evaluations of Topic Models

Compton, Thomas

arXiv.org Artificial IntelligenceAug-1-2025

Topic models are gaining increasing commercial and academic interest for their ability to summarize large volumes of unstructured text. As unsupervised machine learning methods, they enable researchers to explore data and help general users understand key themes in large text collections. However, they risk becoming a 'black box', where users input data and accept the output as an accurate summary without scrutiny. This article evaluates topic models from a database perspective, drawing insights from 1140 BERTopic model runs. The goal is to identify trade-offs in optimizing model parameters and to reflect on what these findings mean for the interpretation and responsible use of topic models

machine learning, natural language, topic model, (21 more...)

arXiv.org Artificial Intelligence

2507.23364

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)

Add feedback

Model Directions, Not Words: Mechanistic Topic Models Using Sparse Autoencoders

Zheng, Carolina, Beltran-Velez, Nicolas, Karlekar, Sweta, Shi, Claudia, Nazaret, Achille, Mallik, Asif, Feder, Amir, Blei, David M.

arXiv.org Artificial IntelligenceAug-1-2025

Traditional topic models are effective at uncovering latent themes in large text collections. However, due to their reliance on bag-of-words representations, they struggle to capture semantically abstract features. While some neural variants use richer representations, they are similarly constrained by expressing topics as word lists, which limits their ability to articulate complex topics. We introduce Mechanistic Topic Models (MTMs), a class of topic models that operate on interpretable features learned by sparse autoencoders (SAEs). By defining topics over this semantically rich space, MTMs can reveal deeper conceptual themes with expressive feature descriptions. Moreover, uniquely among topic models, MTMs enable controllable text generation using topic-based steering vectors. To properly evaluate MTM topics against word-list-based approaches, we propose \textit{topic judge}, an LLM-based pairwise comparison evaluation framework. Across five datasets, MTMs match or exceed traditional and neural baselines on coherence metrics, are consistently preferred by topic judge, and enable effective steering of LLM outputs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2507.2322

Country:

North America > United States (1.00)
Asia (0.67)
Europe (0.67)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Sports (1.00)
Law (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
(2 more...)

Add feedback

InsurTech innovation using natural language processing

Dong, Panyi, Quan, Zhiyu

arXiv.org Machine LearningJul-30-2025

InsurTech refers to the use of state-of-the-art technology, including both emerging hardware and software, to address inefficiencies across the insurance value chain and further explore new opportunities to reshape traditional business operations. InsurTech encompasses a broad spectrum of technology-driven innovations, including, but not limited to, telematics, usage-based insurance, and the integration of Internet of Things (IoT) sensors. In this study, we focus on a specific class of InsurTech, an Insurtech data vendor, that provides insurance companies with next-generation data solutions. We leverage new and diverse external data sources, such as social media data and online content, to enrich the internal database, thereby empowering actuarial analytics and gaining more accurate insights into risk profiles and policyholder behavior. Specifically, by integrating alternative data sources beyond traditional information, insurance companies can uncover previously unrecognized risk factors, reduce bias in existing features, and identify more accurate risk exposures based on the operational characteristics of the insured entities.

information retrieval, large language model, machine learning, (24 more...)

arXiv.org Machine Learning

2507.21112

Country:

North America > United States > California (0.05)
North America > United States > Illinois > Cook County > Chicago (0.05)
North America > United States > New Jersey (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Banking & Finance > Insurance (1.00)
Government > Regional Government > North America Government > United States Government (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(7 more...)

Add feedback

FD-Bench: A Full-Duplex Benchmarking Pipeline Designed for Full Duplex Spoken Dialogue Systems

Peng, Yizhou, Chao, Yi-Wen, Ng, Dianwen, Ma, Yukun, Ni, Chongjia, Ma, Bin, Chng, Eng Siong

arXiv.org Artificial IntelligenceJul-28-2025

Full-duplex spoken dialogue systems (FDSDS) enable more natural human-machine interactions by allowing real-time user interruptions and backchanneling, compared to traditional SDS that rely on turn-taking. However, existing benchmarks lack metrics for FD scenes, e.g., evaluating model performance during user interruptions. In this paper, we present a comprehensive FD benchmarking pipeline utilizing LLMs, TTS, and ASR to address this gap. It assesses FDSDS's ability to handle user interruptions, manage delays, and maintain robustness in challenging scenarios with diverse novel metrics. We applied our benchmark to three open-source FDSDS (Moshi, Freeze-omni, and VIT A-1.5) using over 40 hours of generated speech, with 293 simulated conversations and 1,200 interruptions. The results show that all models continue to face challenges, such as failing to respond to user interruptions, under frequent disruptions and noisy conditions. Demonstrations, data, and code will be released.

interruption, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2507.1904

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

FinDPO: Financial Sentiment Analysis for Algorithmic Trading through Preference Optimization of LLMs

Iacovides, Giorgos, Zhou, Wuyang, Mandic, Danilo

arXiv.org Artificial IntelligenceJul-25-2025

Opinions expressed in online finance-related textual data are having an increasingly profound impact on trading decisions and market movements. This trend highlights the vital role of sentiment analysis as a tool for quantifying the nature and strength of such opinions. With the rapid development of Generative AI (GenAI), supervised fine-tuned (SFT) large language models (LLMs) have become the de facto standard for financial sentiment analysis. However, the SFT paradigm can lead to memorization of the training data and often fails to generalize to unseen samples. This is a critical limitation in financial domains, where models must adapt to previously unobserved events and the nuanced, domain-specific language of finance. To this end, we introduce FinDPO, the first finance-specific LLM framework based on post-training human preference alignment via Direct Preference Optimization (DPO). The proposed FinDPO achieves state-of-the-art performance on standard sentiment classification benchmarks, outperforming existing supervised fine-tuned models by 11% on the average. Uniquely, the FinDPO framework enables the integration of a fine-tuned causal LLM into realistic portfolio strategies through a novel 'logit-to-score' conversion, which transforms discrete sentiment predictions into continuous, rankable sentiment scores (probabilities). In this way, simulations demonstrate that FinDPO is the first sentiment-based approach to maintain substantial positive returns of 67% annually and strong risk-adjusted performance, as indicated by a Sharpe ratio of 2.0, even under realistic transaction costs of 5 basis points (bps).

large language model, machine learning, sentiment analysis, (15 more...)

arXiv.org Artificial Intelligence

2507.18417

Genre: Research Report (1.00)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Mapping Technological Futures: Anticipatory Discourse Through Text Mining

Skorski, Maciej, Landowska, Alina, Rajda, Krzysztof

arXiv.org Artificial IntelligenceJul-25-2025

The volatility and unpredictability of emerging technologies, such as artificial intelligence (AI), generate significant uncertainty, which is widely discussed on social media. This study examines anticipatory discourse surrounding technological futures by analysing 1.5 million posts from 400 key opinion leaders (KOLs) published on the X platform (from 2021 to 2023). Using advanced text mining techniques, including BERTopic modelling, sentiment, emotion, and attitude analyses, the research identifies 100 distinct topics reflecting anticipated tech-driven futures. Our findings emphasize the dual role of KOLs in framing \textit{present futures} -- optimistic visions of transformative technologies like AI and IoT -- and influencing \textit{future presents}, where these projections shape contemporary societal and geopolitical debates. Positive emotions such as Hope dominate, outweighing Anxiety, particularly in topics like ``Machine Learning, Data Science, and Deep Learning,'' while discussions around ``Climate Change'' and ``War, Ukraine, and Trump People'' elicit \textit{Anxiety}. By framing technologies as solutions to societal challenges, KOLs act as mediators of societal narratives, bridging imagined futures and current realities. These insights underscore their pivotal role in directing public attention with emerging technologies during periods of heightened uncertainty, advancing our understanding of anticipatory discourse in technology-mediated contexts.

data mining, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2504.02853

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Middle East > UAE (0.28)

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Government (1.00)
Banking & Finance (1.00)
Information Technology > Security & Privacy (0.93)
(4 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.68)
Information Technology > Data Science > Data Mining > Text Mining (0.61)
(3 more...)

Add feedback

BoSS: Beyond-Semantic Speech

Wang, Qing, Li, Zehan, Lv, Hang, Chen, Hongjie, Song, Yaodong, Kang, Jian, Lian, Jie, Li, Jie, Li, Yongxiang, He, Zhongjiang, Li, Xuelong

arXiv.org Artificial IntelligenceJul-24-2025

Human communication involves more than explicit semantics, with implicit signals and contextual cues playing a critical role in shaping meaning. However, modern speech technologies, such as Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) often fail to capture these beyond-semantic dimensions. To better characterize and benchmark the progression of speech intelligence, we introduce Spoken Interaction System Capability Levels (L1-L5), a hierarchical framework illustrated the evolution of spoken dialogue systems from basic command recognition to human-like social interaction. To support these advanced capabilities, we propose Beyond-Semantic Speech (BoSS), which refers to the set of information in speech communication that encompasses but transcends explicit semantics. It conveys emotions, contexts, and modifies or extends meanings through multidimensional features such as affective cues, contextual dynamics, and implicit semantics, thereby enhancing the understanding of communicative intentions and scenarios. We present a formalized framework for BoSS, leveraging cognitive relevance theories and machine learning models to analyze temporal and contextual speech dynamics. We evaluate BoSS-related attributes across five different dimensions, reveals that current spoken language models (SLMs) are hard to fully interpret beyond-semantic signals. These findings highlight the need for advancing BoSS research to enable richer, more context-aware human-machine communication.

information, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2507.17563

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)

Add feedback

CLAMP: Contrastive Learning with Adaptive Multi-loss and Progressive Fusion for Multimodal Aspect-Based Sentiment Analysis

He, Xiaoqiang

arXiv.org Artificial IntelligenceJul-24-2025

Multimodal aspect-based sentiment analysis(MABSA) seeks to identify aspect terms within paired image-text data and determine their fine grained sentiment polarities, representing a fundamental task for improving the effectiveness of applications such as product review systems and public opinion monitoring. Existing methods face challenges such as cross modal alignment noise and insufficient consistency in fine-grained representations. While global modality alignment methods often overlook the connection between aspect terms and their corresponding local visual regions, bridging the representation gap between text and images remains a challenge. To address these limitations, this paper introduces an end to end Contrastive Learning framework with Adaptive Multi-loss and Progressive Attention Fusion(CLAMP). The framework is composed of three novel modules: Progressive Attention Fusion network, Multi-task Contrastive Learning, and Adaptive Multi-loss Aggregation. The Progressive Attention Fusion network enhances fine-grained alignment between textual features and image regions via hierarchical, multi-stage cross modal interactions, effectively suppressing irrelevant visual noise. Secondly, multi-task contrastive learning combines global modal contrast and local granularity alignment to enhance cross modal representation consistency. Adaptive Multi-loss Aggregation employs a dynamic uncertainty based weighting mechanism to calibrate loss contributions according to each task's uncertainty, thereby mitigating gradient interference. Evaluation on standard public benchmarks demonstrates that CLAMP consistently outperforms the vast majority of existing state of the art methods.

machine learning, natural language, sentiment analysis, (23 more...)

arXiv.org Artificial Intelligence

2507.16854

Country: Asia (0.28)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
(3 more...)

Add feedback