Goto

Collaborating Authors

 Discourse & Dialogue


Iterative Improvement of an Additively Regularized Topic Model

arXiv.org Artificial Intelligence

Topic modelling is fundamentally a soft clustering problem (of known objects -- documents, over unknown clusters -- topics). That is, the task is incorrectly posed. In particular, the topic models are unstable and incomplete. All this leads to the fact that the process of finding a good topic model (repeated hyperparameter selection, model training, and topic quality assessment) can be particularly long and labor-intensive. We aim to simplify the process, to make it more deterministic and provable. To this end, we present a method for iterative training of a topic model. The essence of the method is that a series of related topic models are trained so that each subsequent model is at least as good as the previous one, i.e., that it retains all the good topics found earlier. The connection between the models is achieved by additive regularization. The result of this iterative training is the last topic model in the series, which we call the iteratively updated additively regularized topic model (ITAR). Experiments conducted on several collections of natural language texts show that the proposed ITAR model performs better than other popular topic models (LDA, ARTM, BERTopic), its topics are diverse, and its perplexity (ability to "explain" the underlying data) is moderate.


Enhancing Visual Dialog State Tracking through Iterative Object-Entity Alignment in Multi-Round Conversations

arXiv.org Artificial Intelligence

Visual Dialog (VD) is a task where an agent answers a series of image-related questions based on a multi-round dialog history. However, previous VD methods often treat the entire dialog history as a simple text input, disregarding the inherent conversational information flows at the round level. In this paper, we introduce Multi-round Dialogue State Tracking model (MDST), a framework that addresses this limitation by leveraging the dialogue state learned from dialog history to answer questions. MDST captures each round of dialog history, constructing internal dialogue state representations defined as 2-tuples of vision-language representations. These representations effectively ground the current question, enabling the generation of accurate answers. Experimental results on the VisDial v1.0 dataset demonstrate that MDST achieves a new state-of-the-art performance in generative setting. Furthermore, through a series of human studies, we validate the effectiveness of MDST in generating long, consistent, and human-like answers while consistently answering a series of questions correctly.


Survey on biomarkers in human vocalizations

arXiv.org Artificial Intelligence

Recent years has witnessed an increase in technologies that use speech for the sensing of the health of the talker. This survey paper proposes a general taxonomy of the technologies and a broad overview of current progress and challenges. Vocal biomarkers are often secondary measures that are approximating a signal of another sensor or identifying an underlying mental, cognitive, or physiological state. Their measurement involve disturbances and uncertainties that may be considered as noise sources and the biomarkers are coarsely qualified in terms of the various sources of noise involved in their determination. While in some proposed biomarkers the error levels seem high, there are vocal biomarkers where the errors are expected to be low and thus are more likely to qualify as candidates for adoption in healthcare applications.


Analyzing Consumer Reviews for Understanding Drivers of Hotels Ratings: An Indian Perspective

arXiv.org Artificial Intelligence

In the internet era, almost every business entity is trying to have its digital footprint in digital media and other social media platforms. For these entities, word of mouse is also very important. Particularly, this is quite crucial for the hospitality sector dealing with hotels, restaurants etc. Consumers do read other consumers reviews before making final decisions. This is where it becomes very important to understand which aspects are affecting most in the minds of the consumers while giving their ratings. The current study focuses on the consumer reviews of Indian hotels to extract aspects important for final ratings. The study involves gathering data using web scraping methods, analyzing the texts using Latent Dirichlet Allocation for topic extraction and sentiment analysis for aspect-specific sentiment mapping. Finally, it incorporates Random Forest to understand the importance of the aspects in predicting the final rating of a user.


Target-Dependent Multimodal Sentiment Analysis Via Employing Visual-to Emotional-Caption Translation Network using Visual-Caption Pairs

arXiv.org Artificial Intelligence

The natural language processing and multimedia field has seen a notable surge in interest in multimodal sentiment recognition. Hence, this study aims to employ Target-Dependent Multimodal Sentiment Analysis (TDMSA) to identify the level of sentiment associated with every target (aspect) stated within a multimodal post consisting of a visual-caption pair. Despite the recent advancements in multimodal sentiment recognition, there has been a lack of explicit incorporation of emotional clues from the visual modality, specifically those pertaining to facial expressions. The challenge at hand is to proficiently obtain visual and emotional clues and subsequently synchronise them with the textual content. In light of this fact, this study presents a novel approach called the Visual-to-Emotional-Caption Translation Network (VECTN) technique. The primary objective of this strategy is to effectively acquire visual sentiment clues by analysing facial expressions. Additionally, it effectively aligns and blends the obtained emotional clues with the target attribute of the caption mode. The experimental findings demonstrate that our methodology is capable of producing ground-breaking outcomes when applied to two publicly accessible multimodal Twitter datasets, namely, Twitter-2015 and Twitter-2017. The experimental results show that the suggested model achieves an accuracy of 81.23% and a macro-F1 of 80.61% on the Twitter-15 dataset, while 77.42% and 75.19% on the Twitter-17 dataset, respectively. The observed improvement in performance reveals that our model is better than others when it comes to collecting target-level sentiment in multimodal data using the expressions of the face.


Infusing Emotions into Task-oriented Dialogue Systems: Understanding, Management, and Generation

arXiv.org Artificial Intelligence

Emotions are indispensable in human communication, but are often overlooked in task-oriented dialogue (ToD) modelling, where the task success is the primary focus. While existing works have explored user emotions or similar concepts in some ToD tasks, none has so far included emotion modelling into a fully-fledged ToD system nor conducted interaction with human or simulated users. In this work, we incorporate emotion into the complete ToD processing loop, involving understanding, management, and generation. To this end, we extend the EmoWOZ dataset (Feng et al., 2022) with system affective behaviour labels. Through interactive experimentation involving both simulated and human users, we demonstrate that our proposed framework significantly enhances the user's emotional experience as well as the task success.


Tracking Emotional Dynamics in Chat Conversations: A Hybrid Approach using DistilBERT and Emoji Sentiment Analysis

arXiv.org Artificial Intelligence

Computer-mediated communication has become more important than face-to-face communication in many contexts. Tracking emotional dynamics in chat conversations can enhance communication, improve services, and support well-being in various contexts. This paper explores a hybrid approach to tracking emotional dynamics in chat conversations by combining DistilBERT-based text emotion detection and emoji sentiment analysis. A Twitter dataset was analyzed using various machine learning algorithms, including SVM, Random Forest, and AdaBoost. We contrasted their performance with DistilBERT. Results reveal DistilBERT's superior performance in emotion recognition. Our approach accounts for emotive expressions conveyed through emojis to better understand participants' emotions during chats. We demonstrate how this approach can effectively capture and analyze emotional shifts in real-time conversations. Our findings show that integrating text and emoji analysis is an effective way of tracking chat emotion, with possible applications in customer service, work chats, and social media interactions.


Transforming Slot Schema Induction with Generative Dialogue State Inference

arXiv.org Artificial Intelligence

The challenge of defining a slot schema to represent the state of a task-oriented dialogue system is addressed by Slot Schema Induction (SSI), which aims to automatically induce slots from unlabeled dialogue data. Whereas previous approaches induce slots by clustering value spans extracted directly from the dialogue text, we demonstrate the power of discovering slots using a generative approach. By training a model to generate slot names and values that summarize key dialogue information with no prior task knowledge, our SSI method discovers high-quality candidate information for representing dialogue state. These discovered slot-value candidates can be easily clustered into unified slot schemas that align well with human-authored schemas. Experimental comparisons on the MultiWOZ and SGD datasets demonstrate that Generative Dialogue State Inference (GenDSI) outperforms the previous state-of-the-art on multiple aspects of the SSI task.


HBot: A Chatbot for Healthcare Applications in Traditional Chinese Medicine Based on Human Body 3D Visualization

arXiv.org Artificial Intelligence

The unique diagnosis and treatment techniques and remarkable clinical efficacy of traditional Chinese medicine (TCM) make it play an important role in the field of elderly care and healthcare, especially in the rehabilitation of some common chronic diseases of the elderly. Therefore, building a TCM chatbot for healthcare application will help users obtain consultation services in a direct and natural way. However, concepts such as acupuncture points (acupoints) and meridians involved in TCM always appear in the consultation, which cannot be displayed intuitively. To this end, we develop a \textbf{h}ealthcare chat\textbf{bot} (HBot) based on a human body model in 3D and knowledge graph, which provides conversational services such as knowledge Q\&A, prescription recommendation, moxibustion therapy recommendation, and acupoint search. When specific acupoints are involved in the conversations between user and HBot, the 3D body will jump to the corresponding acupoints and highlight them. Moreover, Hbot can also be used in training scenarios to accelerate the teaching process of TCM by intuitively displaying acupuncture points and knowledge cards. The demonstration video is available at https://www.youtube.com/watch?v=UhQhutSKkTU . Our code and dataset are publicly available at Gitee: https://gitee.com/plabrolin/interactive-3d-acup.git


Generative Sentiment Analysis via Latent Category Distribution and Constrained Decoding

arXiv.org Artificial Intelligence

Fine-grained sentiment analysis involves extracting and organizing sentiment elements from textual data. However, existing approaches often overlook issues of category semantic inclusion and overlap, as well as inherent structural patterns within the target sequence. This study introduces a generative sentiment analysis model. To address the challenges related to category semantic inclusion and overlap, a latent category distribution variable is introduced. By reconstructing the input of a variational autoencoder, the model learns the intensity of the relationship between categories and text, thereby improving sequence generation. Additionally, a trie data structure and constrained decoding strategy are utilized to exploit structural patterns, which in turn reduces the search space and regularizes the generation process. Experimental results on the Restaurant-ACOS and Laptop-ACOS datasets demonstrate a significant performance improvement compared to baseline models. Ablation experiments further confirm the effectiveness of latent category distribution and constrained decoding strategy.