AITopics | Discourse & Dialogue

Collaborating Authors

Discourse & Dialogue

Understanding Language in Conversations "The problems addressed in discourse research aim to answer two general kinds of questions: (1) what information is contained in extended sequences of utterances that goes beyond the meaning of the individual utterances themselves? (2) how does the context in which an utterance is used affect the meaning of the individual utterances, or parts of them?"
– Barbara Grosz. Overview of Chapter 6: Discourse and Dialogue, Survey of the State of the Art in Human Language Technology (1996).

News Overviews Instructional Materials AI-Alerts Classics

Labeled Interactive Topic Models

Seelman, Kyle, Zhang, Mozhi, Boyd-Graber, Jordan

arXiv.org Artificial IntelligenceFeb-7-2024

Topic models are valuable for understanding extensive document collections, but they don't always identify the most relevant topics. Classical probabilistic and anchor-based topic models offer interactive versions that allow users to guide the models towards more pertinent topics. However, such interactive features have been lacking in neural topic models. To correct this lacuna, we introduce a user-friendly interaction for neural topic models. This interaction permits users to assign a word label to a topic, leading to an update in the topic model where the words in the topic become closely aligned with the given label. Our approach encompasses two distinct kinds of neural topic models. The first includes models where topic embeddings are trainable and evolve during the training process. The second kind involves models where topic embeddings are integrated post-training, offering a different approach to topic refinement. To facilitate user interaction with these neural topic models, we have developed an interactive interface. This interface enables users to engage with and re-label topics as desired. We evaluate our method through a human study, where users can relabel topics to find relevant documents. Using our method, user labeling improves document rank scores, helping to find more relevant documents to a given query when compared to no user labeling.

computational linguistic, neural topic model, topic model, (12 more...)

arXiv.org Artificial Intelligence

2311.09438

Country:

Asia > Middle East > Jordan (0.06)
North America > Cuba (0.05)
Asia > India (0.05)
(10 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

An Analysis of Dialogue Repair in Voice Assistants

Galbraith, Matthew

arXiv.org Artificial IntelligenceFeb-7-2024

Spoken dialogue systems have transformed human-machine interaction by providing real-time responses to queries. However, misunderstandings between the user and system persist. This study explores the significance of interactional language in dialogue repair between virtual assistants and users by analyzing interactions with Google Assistant and Siri, focusing on their utilization and response to the other-initiated repair strategy "huh?" prevalent in human-human interaction. Findings reveal several assistant-generated strategies but an inability to replicate human-like repair strategies such as "huh?". English and Spanish user acceptability surveys show differences in users' repair strategy preferences and assistant usage, with both similarities and disparities among the two surveyed languages. These results shed light on inequalities between interactional language in human-human interaction and human-machine interaction, underscoring the need for further research on the impact of interactional language in human-machine interaction in English and beyond.

interaction, interactional language, repair strategy, (10 more...)

arXiv.org Artificial Intelligence

2311.03952

Country:

Europe > Netherlands > North Brabant > Eindhoven (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.71)

Add feedback

CFTM: Continuous time fractional topic model

Nakagawa, Kei, Hayashi, Kohei, Fujimoto, Yugo

arXiv.org Artificial IntelligenceFeb-6-2024

In this paper, we propose the Continuous Time Fractional Topic Model (cFTM), a new method for dynamic topic modeling. This approach incorporates fractional Brownian motion~(fBm) to effectively identify positive or negative correlations in topic and word distribution over time, revealing long-term dependency or roughness. Our theoretical analysis shows that the cFTM can capture these long-term dependency or roughness in both topic and word distributions, mirroring the main characteristics of fBm. Moreover, we prove that the parameter estimation process for the cFTM is on par with that of LDA, traditional topic models. To demonstrate the cFTM's property, we conduct empirical study using economic news articles. The results from these tests support the model's ability to identify and track long-term dependency or roughness in topics over time.

cftm, roughness, topic model, (14 more...)

arXiv.org Artificial Intelligence

2402.01734

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom (0.06)
(4 more...)

Genre: Research Report (1.00)

Industry: Banking & Finance > Economy (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Homograph Attacks on Maghreb Sentiment Analyzers

Qachfar, Fatima Zahra, Verma, Rakesh M.

arXiv.org Artificial IntelligenceFeb-5-2024

We examine the impact of homograph attacks on the Sentiment Analysis (SA) task of different Arabic dialects from the Maghreb North-African countries. Homograph attacks result in a 65.3% decrease in transformer classification from an F1-score of 0.95 to 0.33 when data is written in "Arabizi". The goal of this study is to highlight LLMs weaknesses' and to prioritize ethical and responsible Machine Learning.

dataset, dialect, homograph attack, (11 more...)

arXiv.org Artificial Intelligence

2402.03171

Country:

North America > United States > Texas > Harris County > Houston (0.06)
Africa > North Africa (0.05)
Africa > Middle East > Tunisia (0.05)
(3 more...)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.30)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.77)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.77)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Linguistic features for sentence difficulty prediction in ABSA

Chifu, Adrian-Gabriel, Fournier, Sébastien

arXiv.org Artificial IntelligenceFeb-5-2024

One of the challenges of natural language understanding is to deal with the subjectivity of sentences, which may express opinions and emotions that add layers of complexity and nuance. Sentiment analysis is a field that aims to extract and analyze these subjective elements from text, and it can be applied at different levels of granularity, such as document, paragraph, sentence, or aspect. Aspect-based sentiment analysis is a well-studied topic with many available data sets and models. However, there is no clear definition of what makes a sentence difficult for aspect-based sentiment analysis. In this paper, we explore this question by conducting an experiment with three data sets: "Laptops", "Restaurants", and "MTSC" (Multi-Target-dependent Sentiment Classification), and a merged version of these three datasets. We study the impact of domain diversity and syntactic diversity on difficulty. We use a combination of classifiers to identify the most difficult sentences and analyze their characteristics. We employ two ways of defining sentence difficulty. The first one is binary and labels a sentence as difficult if the classifiers fail to correctly predict the sentiment polarity. The second one is a six-level scale based on how many of the top five best-performing classifiers can correctly predict the sentiment polarity. We also define 9 linguistic features that, combined, aim at estimating the difficulty at sentence level.

corpus, experiment, sentiment analysis, (14 more...)

arXiv.org Artificial Intelligence

2402.03163

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.05)
Asia > Singapore (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (0.47)
Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.70)

Add feedback

With a Little Help from my (Linguistic) Friends: Topic Segmentation of Multi-party Casual Conversations

Decker, Amandine, Amblard, Maxime

arXiv.org Artificial IntelligenceFeb-5-2024

Topics play an important role in the global organisation of a conversation as what is currently discussed constrains the possible contributions of the participant. Understanding the way topics are organised in interaction would provide insight on the structure of dialogue beyond the sequence of utterances. However, studying this high-level structure is a complex task that we try to approach by first segmenting dialogues into smaller topically coherent sets of utterances. Understanding the interactions between these segments would then enable us to propose a model of topic organisation at a dialogue level. In this paper we work with open-domain conversations and try to reach a comparable level of accuracy as recent machine learning based topic segmentation models but with a formal approach. The features we identify as meaningful for this task help us understand better the topical structure of a conversation.

segmentation, utterance, xing and carenini, (14 more...)

arXiv.org Artificial Intelligence

2402.02837

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)
(15 more...)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (0.68)
Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.46)

Add feedback

A Quantitative Discourse Analysis of Asian Workers in the US Historical Newspapers

Park, Jaihyun, Cordell, Ryan

arXiv.org Artificial IntelligenceFeb-4-2024

Warning: This paper contains examples of offensive language targetting marginalized population. The digitization of historical texts invites researchers to explore the large-scale corpus of historical texts with computational methods. In this study, we present computational text analysis on a relatively understudied topic of how Asian workers are represented in historical newspapers in the United States. We found that the word "coolie" was semantically different in some States (e.g., Massachusetts, Rhode Island, Wyoming, Oklahoma, and Arkansas) with the different discourses around coolie. We also found that then-Confederate newspapers and then-Union newspapers formed distinctive discourses by measuring over-represented words. Newspapers from then-Confederate States associated coolie with slavery-related words. In addition, we found Asians were perceived to be inferior to European immigrants and subjected to the target of racism. This study contributes to supplementing the qualitative analysis of racism in the United States with quantitative discourse analysis.

coolie, newspaper, united states, (15 more...)

arXiv.org Artificial Intelligence

2402.02572

Country:

North America > United States > Massachusetts (0.26)
North America > United States > Rhode Island (0.26)
North America > United States > Arkansas (0.25)
(50 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Media > News (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.70)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.49)

Add feedback

Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon

Koto, Fajri, Beck, Tilman, Talat, Zeerak, Gurevych, Iryna, Baldwin, Timothy

arXiv.org Artificial IntelligenceFeb-3-2024

Improving multilingual language models capabilities in low-resource languages is generally difficult due to the scarcity of large-scale data in those languages. In this paper, we relax the reliance on texts in low-resource languages by using multilingual lexicons in pretraining to enhance multilingual capabilities. Specifically, we focus on zero-shot sentiment analysis tasks across 34 languages, including 6 high/medium-resource languages, 25 low-resource languages, and 3 code-switching datasets. We demonstrate that pretraining using multilingual lexicons, without using any sentence-level sentiment data, achieves superior zero-shot performance compared to models fine-tuned on English sentiment datasets, and large language models like GPT--3.5, BLOOMZ, and XGLM. These findings are observable for unseen low-resource languages to code-mixed scenarios involving high-resource languages.

computational linguistic, lexicon, ml lex, (13 more...)

arXiv.org Artificial Intelligence

2402.02113

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
Asia > China > Beijing > Beijing (0.04)
(27 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Exploring the Robustness of Task-oriented Dialogue Systems for Colloquial German Varieties

Artemova, Ekaterina, Blaschke, Verena, Plank, Barbara

arXiv.org Artificial IntelligenceFeb-3-2024

Mainstream cross-lingual task-oriented dialogue (ToD) systems leverage the transfer learning paradigm by training a joint model for intent recognition and slot-filling in English and applying it, zero-shot, to other languages. We address a gap in prior research, which often overlooked the transfer to lower-resource colloquial varieties due to limited test data. Inspired by prior work on English varieties, we craft and manually evaluate perturbation rules that transform German sentences into colloquial forms and use them to synthesize test sets in four ToD datasets. Our perturbation rules cover 18 distinct language phenomena, enabling us to explore the impact of each perturbation on slot and intent performance. Using these new datasets, we conduct an experimental evaluation across six different transformers. Here, we demonstrate that when applied to colloquial varieties, ToD systems maintain their intent recognition performance, losing 6% (4.62 percentage points) in accuracy on average. However, they exhibit a significant drop in slot detection, with a decrease of 31% (21 percentage points) in slot F1 score. Our findings are further supported by a transfer experiment from Standard American English to synthetic Urban African American Vernacular English.

computational linguistic, perturbation, proceedings, (15 more...)

arXiv.org Artificial Intelligence

2402.02078

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
(24 more...)

Genre: Research Report > New Finding (0.87)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.68)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.48)

Add feedback

Self-training Strategies for Sentiment Analysis: An Empirical Study

Liu, Haochen, Rallabandi, Sai Krishna, Wu, Yijing, Dakle, Parag Pravin, Raghavan, Preethi

arXiv.org Artificial IntelligenceFeb-3-2024

Sentiment analysis is a crucial task in natural language processing that involves identifying and extracting subjective sentiment from text. Self-training has recently emerged as an economical and efficient technique for developing sentiment analysis models by leveraging a small amount of labeled data and a large amount of unlabeled data. However, given a set of training data, how to utilize them to conduct self-training makes a significant difference in the final performance of the model. We refer to this methodology as the self-training strategy. In this paper, we present an empirical study of various self-training strategies for sentiment analysis. First, we investigate the influence of the self-training strategy and hyper-parameters on the performance of traditional small language models (SLMs) in various few-shot settings. Second, we also explore the feasibility of leveraging large language models (LLMs) to help self-training. We propose and empirically compare several self-training strategies with the intervention of LLMs. Extensive experiments are conducted on three real-world sentiment analysis datasets.

dataset, llm, sentiment analysis, (17 more...)

arXiv.org Artificial Intelligence

2309.08777

Country:

Asia > Macao (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback