Discourse & Dialogue
A Self-Adjusting Fusion Representation Learning Model for Unaligned Text-Audio Sequences
Yang, Kaicheng, Zhang, Ruxuan, Xu, Hua, Gao, Kai
Inter-modal interaction plays an indispensable role in multimodal sentiment analysis. Due to different modalities sequences are usually non-alignment, how to integrate relevant information of each modality to learn fusion representations has been one of the central challenges in multimodal learning. In this paper, a Self-Adjusting Fusion Representation Learning Model (SA-FRLM) is proposed to learn robust crossmodal fusion representations directly from the unaligned text and audio sequences. Different from previous works, our model not only makes full use of the interaction between different modalities but also maximizes the protection of the unimodal characteristics. Specifically, we first employ a crossmodal alignment module to project different modalities features to the same dimension. The crossmodal collaboration attention is then adopted to model the inter-modal interaction between text and audio sequences and initialize the fusion representations. After that, as the core unit of the SA-FRLM, the crossmodal adjustment transformer is proposed to protect original unimodal characteristics. It can dynamically adapt the fusion representations by using single modal streams. We evaluate our approach on the public multimodal sentiment analysis datasets CMU-MOSI and CMU-MOSEI. The experiment results show that our model has significantly improved the performance of all the metrics on the unaligned text-audio sequences.
Lifelong and Continual Learning Dialogue Systems
Dialogue systems, commonly known as chatbots, have gained escalating popularity in recent times due to their wide-spread applications in carrying out chit-chat conversations with users and task-oriented dialogues to accomplish various user tasks. Existing chatbots are usually trained from pre-collected and manually-labeled data and/or written with handcrafted rules. Many also use manually-compiled knowledge bases (KBs). Their ability to understand natural language is still limited, and they tend to produce many errors resulting in poor user satisfaction. Typically, they need to be constantly improved by engineers with more labeled data and more manually compiled knowledge. This book introduces the new paradigm of lifelong learning dialogue systems to endow chatbots the ability to learn continually by themselves through their own self-initiated interactions with their users and working environments to improve themselves. As the systems chat more and more with users or learn more and more from external sources, they become more and more knowledgeable and better and better at conversing. The book presents the latest developments and techniques for building such continual learning dialogue systems that continuously learn new language expressions and lexical and factual knowledge during conversation from users and off conversation from external sources, acquire new training examples during conversation, and learn conversational skills. Apart from these general topics, existing works on continual learning of some specific aspects of dialogue systems are also surveyed. The book concludes with a discussion of open challenges for future research.
A Span-level Bidirectional Network for Aspect Sentiment Triplet Extraction
Chen, Yuqi, Chen, Keming, Sun, Xian, Zhang, Zequn
Aspect Sentiment Triplet Extraction (ASTE) is a new fine-grained sentiment analysis task that aims to extract triplets of aspect terms, sentiments, and opinion terms from review sentences. Recently, span-level models achieve gratifying results on ASTE task by taking advantage of the predictions of all possible spans. Since all possible spans significantly increases the number of potential aspect and opinion candidates, it is crucial and challenging to efficiently extract the triplet elements among them. In this paper, we present a span-level bidirectional network which utilizes all possible spans as input and extracts triplets from spans bidirectionally. Specifically, we devise both the aspect decoder and opinion decoder to decode the span representations and extract triples from aspect-to-opinion and opinion-to-aspect directions. With these two decoders complementing with each other, the whole network can extract triplets from spans more comprehensively. Moreover, considering that mutual exclusion cannot be guaranteed between the spans, we design a similar span separation loss to facilitate the downstream task of distinguishing the correct span by expanding the KL divergence of similar spans during the training process; in the inference process, we adopt an inference strategy to remove conflicting triplets from the results base on their confidence scores. Experimental results show that our framework not only significantly outperforms state-of-the-art methods, but achieves better performance in predicting triplets with multi-token entities and extracting triplets in sentences contain multi-triplets.
Sentiment Analysis of Persian Language: Review of Algorithms, Approaches and Datasets
Nazarizadeh, Ali, Banirostam, Touraj, Sayyadpour, Minoo
Sentiment analysis aims to extract people's emotions and opinion from their comments on the web. It widely used in businesses to detect sentiment in social data, gauge brand reputation, and understand customers. Most of articles in this area have concentrated on the English language whereas there are limited resources for Persian language. In this review paper, recent published articles between 2018 and 2022 in sentiment analysis in Persian Language have been collected and their methods, approach and dataset will be explained and analyzed. Almost all the methods used to solve sentiment analysis are machine learning and deep learning. The purpose of this paper is to examine 40 different approach sentiment analysis in the Persian Language, analysis datasets along with the accuracy of the algorithms applied to them and also review strengths and weaknesses of each. Among all the methods, transformers such as BERT and RNN Neural Networks such as LSTM and Bi-LSTM have achieved higher accuracy in the sentiment analysis. In addition to the methods and approaches, the datasets reviewed are listed between 2018 and 2022 and information about each dataset and its details are provided.
Control Globally, Understand Locally: A Global-to-Local Hierarchical Graph Network for Emotional Support Conversation
Peng, Wei, Hu, Yue, Xing, Luxi, Xie, Yuqiang, Sun, Yajing, Li, Yunpeng
Emotional support conversation aims at reducing the emotional distress of the help-seeker, which is a new and challenging task. It requires the system to explore the cause of help-seeker's emotional distress and understand their psychological intention to provide supportive responses. However, existing methods mainly focus on the sequential contextual information, ignoring the hierarchical relationships with the global cause and local psychological intention behind conversations, thus leads to a weak ability of emotional support. In this paper, we propose a Global-to-Local Hierarchical Graph Network to capture the multi-source information (global cause, local intentions and dialog history) and model hierarchical relationships between them, which consists of a multi-source encoder, a hierarchical graph reasoner, and a global-guide decoder. Furthermore, a novel training objective is designed to monitor semantic information of the global cause. Experimental results on the emotional support conversation dataset, ESConv, confirm that the proposed GLHG has achieved the state-of-the-art performance on the automatic and human evaluations. The code will be released in here \footnote{\small{~https://github.com/pengwei-iie/GLHG}}.
AX-MABSA: A Framework for Extremely Weakly Supervised Multi-label Aspect Based Sentiment Analysis
Kamila, Sabyasachi, Magdy, Walid, Dutta, Sourav, Wang, MingXue
Aspect Based Sentiment Analysis is a dominant research area with potential applications in social media analytics, business, finance, and health. Prior works in this area are primarily based on supervised methods, with a few techniques using weak supervision limited to predicting a single aspect category per review sentence. In this paper, we present an extremely weakly supervised multi-label Aspect Category Sentiment Analysis framework which does not use any labelled data. We only rely on a single word per class as an initial indicative information. We further propose an automatic word selection technique to choose these seed categories and sentiment words. We explore unsupervised language model post-training to improve the overall performance, and propose a multi-label generator model to generate multiple aspect category-sentiment pairs per review sentence. Experiments conducted on four benchmark datasets showcase our method to outperform other weakly supervised baselines by a significant margin.
Tell Your Story: Task-Oriented Dialogs for Interactive Content Creation
Kottur, Satwik, Moon, Seungwhan, Markosyan, Aram H., Shah, Hardik, Damavandi, Babak, Geramifard, Alborz
People capture photos and videos to relive and share memories of personal significance. Recently, media montages (stories) have become a popular mode of sharing these memories due to their intuitive and powerful storytelling capabilities. However, creating such montages usually involves a lot of manual searches, clicks, and selections that are time-consuming and cumbersome, adversely affecting user experiences. To alleviate this, we propose task-oriented dialogs for montage creation as a novel interactive tool to seamlessly search, compile, and edit montages from a media collection. To the best of our knowledge, our work is the first to leverage multi-turn conversations for such a challenging application, extending the previous literature studying simple media retrieval tasks. We collect a new dataset C3 (Conversational Content Creation), comprising 10k dialogs conditioned on media montages simulated from a large media collection. We take a simulate-and-paraphrase approach to collect these dialogs to be both cost and time efficient, while drawing from natural language distribution. Our analysis and benchmarking of state-of-the-art language models showcase the multimodal challenges present in the dataset. Lastly, we present a real-world mobile demo application that shows the feasibility of the proposed work in real-world applications. Our code and data will be made publicly available.
End-to-End Evaluation of a Spoken Dialogue System for Learning Basic Mathematics
Okur, Eda, Sahay, Saurav, Alba, Roddy Fuentes, Nachman, Lama
The advances in language-based Artificial Intelligence (AI) technologies applied to build educational applications can present AI for social-good opportunities with a broader positive impact. Across many disciplines, enhancing the quality of mathematics education is crucial in building critical thinking and problem-solving skills at younger ages. Conversational AI systems have started maturing to a point where they could play a significant role in helping students learn fundamental math concepts. This work presents a task-oriented Spoken Dialogue System (SDS) built to support play-based learning of basic math concepts for early childhood education. The system has been evaluated via real-world deployments at school while the students are practicing early math concepts with multimodal interactions. We discuss our efforts to improve the SDS pipeline built for math learning, for which we explore utilizing MathBERT representations for potential enhancement to the Natural Language Understanding (NLU) module. We perform an end-to-end evaluation using real-world deployment outputs from the Automatic Speech Recognition (ASR), Intent Recognition, and Dialogue Manager (DM) components to understand how error propagation affects the overall performance in real-world scenarios.
A Multi-task Model for Sentiment Aided Stance Detection of Climate Change Tweets
Upadhyaya, Apoorva, Fisichella, Marco, Nejdl, Wolfgang
Climate change has become one of the biggest challenges of our time. Social media platforms such as Twitter play an important role in raising public awareness and spreading knowledge about the dangers of the current climate crisis. With the increasing number of campaigns and communication about climate change through social media, the information could create more awareness and reach the general public and policy makers. However, these Twitter communications lead to polarization of beliefs, opinion-dominated ideologies, and often a split into two communities of climate change deniers and believers. In this paper, we propose a framework that helps identify denier statements on Twitter and thus classifies the stance of the tweet into one of the two attitudes towards climate change (denier/believer). The sentimental aspects of Twitter data on climate change are deeply rooted in general public attitudes toward climate change. Therefore, our work focuses on learning two closely related tasks: Stance Detection and Sentiment Analysis of climate change tweets. We propose a multi-task framework that performs stance detection (primary task) and sentiment analysis (auxiliary task) simultaneously. The proposed model incorporates the feature-specific and shared-specific attention frameworks to fuse multiple features and learn the generalized features for both tasks. The experimental results show that the proposed framework increases the performance of the primary task, i.e., stance detection by benefiting from the auxiliary task, i.e., sentiment analysis compared to its uni-modal and single-task variants.
Proactive Detractor Detection Framework Based on Message-Wise Sentiment Analysis Over Customer Support Interactions
Gallo, Juan Sebastián Salcedo, Solano, Jesús, García, Javier Hernán, Zarruk-Valencia, David, Correa-Bahnsen, Alejandro
In this work, we propose a framework relying solely on chat-based customer support (CS) interactions for predicting the recommendation decision of individual users. For our case study, we analyzed a total number of 16.4k users and 48.7k customer support conversations within the financial vertical of a large e-commerce company in Latin America. Consequently, our main contributions and objectives are to use Natural Language Processing (NLP) to assess and predict the recommendation behavior where, in addition to using static sentiment analysis, we exploit the predictive power of each user's sentiment dynamics. Our results show that, with respective feature interpretability, it is possible to predict the likelihood of a user to recommend a product or service, based solely on the message-wise sentiment evolution of their CS conversations in a fully automated way.