Goto

Collaborating Authors

 emotion information


MPE-TTS: Customized Emotion Zero-Shot Text-To-Speech Using Multi-Modal Prompt

arXiv.org Artificial Intelligence

Most existing Zero-Shot Text-To-Speech(ZS-TTS) systems generate the unseen speech based on single prompt, such as reference speech or text descriptions, which limits their flexibility. We propose a customized emotion ZS-TTS system based on multi-modal prompt. The system disentangles speech into the content, timbre, emotion and prosody, allowing emotion prompts to be provided as text, image or speech. To extract emotion information from different prompts, we propose a multi-modal prompt emotion encoder. Additionally, we introduce an prosody predictor to fit the distribution of prosody and propose an emotion consistency loss to preserve emotion information in the predicted prosody. A diffusion-based acoustic model is employed to generate the target mel-spectrogram. Both objective and subjective experiments demonstrate that our system outperforms existing systems in terms of naturalness and similarity.


Toward a Dialogue System Using a Large Language Model to Recognize User Emotions with a Camera

arXiv.org Artificial Intelligence

The performance of ChatGPT\copyright{} and other LLMs has improved tremendously, and in online environments, they are increasingly likely to be used in a wide variety of situations, such as ChatBot on web pages, call center operations using voice interaction, and dialogue functions using agents. In the offline environment, multimodal dialogue functions are also being realized, such as guidance by Artificial Intelligence agents (AI agents) using tablet terminals and dialogue systems in the form of LLMs mounted on robots. In this multimodal dialogue, mutual emotion recognition between the AI and the user will become important. So far, there have been methods for expressing emotions on the part of the AI agent or for recognizing them using textual or voice information of the user's utterances, but methods for AI agents to recognize emotions from the user's facial expressions have not been studied. In this study, we examined whether or not LLM-based AI agents can interact with users according to their emotional states by capturing the user in dialogue with a camera, recognizing emotions from facial expressions, and adding such emotion information to prompts. The results confirmed that AI agents can have conversations according to the emotional state for emotional states with relatively high scores, such as Happy and Angry.


Conditioning LLMs with Emotion in Neural Machine Translation

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have shown remarkable performance in Natural Language Processing tasks, including Machine Translation (MT). In this work, we propose a novel MT pipeline that integrates emotion information extracted from a Speech Emotion Recognition (SER) model into LLMs to enhance translation quality. We first fine-tune five existing LLMs on the Libri-trans dataset and select the most performant model. Subsequently, we augment LLM prompts with different dimensional emotions and train the selected LLM under these different configurations. Our experiments reveal that integrating emotion information, especially arousal, into LLM prompts leads to notable improvements in translation quality.


Usefulness of Emotional Prosody in Neural Machine Translation

arXiv.org Artificial Intelligence

Neural Machine Translation (NMT) is the task of translating a text from one language to another with the use of a trained neural network. Several existing works aim at incorporating external information into NMT models to improve or control predicted translations (e.g. sentiment, politeness, gender). In this work, we propose to improve translation quality by adding another external source of information: the automatically recognized emotion in the voice. This work is motivated by the assumption that each emotion is associated with a specific lexicon that can overlap between emotions. Our proposed method follows a two-stage procedure. At first, we select a state-of-the-art Speech Emotion Recognition (SER) model to predict dimensional emotion values from all input audio in the dataset. Then, we use these predicted emotions as source tokens added at the beginning of input texts to train our NMT model. We show that integrating emotion information, especially arousal, into NMT systems leads to better translations.


Knowledge-Bridged Causal Interaction Network for Causal Emotion Entailment

arXiv.org Artificial Intelligence

Causal Emotion Entailment aims to identify causal utterances that are responsible for the target utterance with a non-neutral emotion in conversations. Previous works are limited in thorough understanding of the conversational context and accurate reasoning of the emotion cause. To this end, we propose Knowledge-Bridged Causal Interaction Network (KBCIN) with commonsense knowledge (CSK) leveraged as three bridges. Specifically, we construct a conversational graph for each conversation and leverage the event-centered CSK as the semantics-level bridge (S-bridge) to capture the deep inter-utterance dependencies in the conversational context via the CSK-Enhanced Graph Attention module. Moreover, social-interaction CSK serves as emotion-level bridge (E-bridge) and action-level bridge (A-bridge) to connect candidate utterances with the target one, which provides explicit causal clues for the Emotional Interaction module and Actional Interaction module to reason the target emotion. Experimental results show that our model achieves better performance over most baseline models. Our source code is publicly available at https://github.com/circle-hit/KBCIN.


Modeling Content-Emotion Duality via Disentanglement for Empathetic Conversation

arXiv.org Artificial Intelligence

The task of empathetic response generation aims to understand what feelings a speaker expresses on his/her experiences and then reply to the speaker appropriately. To solve the task, it is essential to model the content-emotion duality of a dialogue, which is composed of the content view (i.e., what personal experiences are described) and the emotion view (i.e., the feelings of the speaker on these experiences). To this end, we design a framework to model the Content-Emotion Duality (CEDual) via disentanglement for empathetic response generation. With disentanglement, we encode the dialogue history from both the content and emotion views, and then generate the empathetic response based on the disentangled representations, thereby both the content and emotion information of the dialogue history can be embedded in the generated response. The experiments on the benchmark dataset EMPATHETICDIALOGUES show that the CEDual model achieves state-of-the-art performance on both automatic and human metrics, and it also generates more empathetic responses than previous methods.


TSAM: A Two-Stream Attention Model for Causal Emotion Entailment

arXiv.org Artificial Intelligence

Causal Emotion Entailment (CEE) aims to discover the potential causes behind an emotion in a conversational utterance. Previous works formalize CEE as independent utterance pair classification problems, with emotion and speaker information neglected. From a new perspective, this paper considers CEE in a joint framework. We classify multiple utterances synchronously to capture the correlations between utterances in a global view and propose a Two-Stream Attention Model (TSAM) to effectively model the speaker's emotional influences in the conversational history. Specifically, the TSAM comprises three modules: Emotion Attention Network (EAN), Speaker Attention Network (SAN), and interaction module. The EAN and SAN incorporate emotion and speaker information in parallel, and the subsequent interaction module effectively interchanges relevant information between the EAN and SAN via a mutual BiAffine transformation. Extensive experimental results demonstrate that our model achieves new State-Of-The-Art (SOT A) performance and outperforms baselines remarkably.


Artificial emotional intelligence: a safer, smarter future with 5G and emotion recognition

#artificialintelligence

With the advent of 5G communication technology and its integration with AI, we are looking at the dawn of a new era in which people, machines, objects, and devices are connected like never before. This smart era will be characterized by smart facilities and services such as self-driving cars, smart UAVs, and intelligent healthcare. This will be the aftermath of a technological revolution. But the flip side of such technological revolution is that AI itself can be used to attack or threaten the security of 5G-enabled systems which, in turn, can greatly compromise their reliability. It is, therefore, imperative to investigate such potential security threats and explore countermeasures before a smart world is realized.


Scientists are creating AI that can detect "anger or fear" in a public area

#artificialintelligence

The dawn of 5G communication technology comes with questions about the imminent future. The smart era can be seen in cities, transport systems, our personal devices and how we're tracking COVID-19 across the world. Robots can even participate in delivering therapy to humans, which is one of the most delicate, emotion-based exchanges that exists. If not for the internet and AI, vaccine candidates would take months to find. AI that examines data and narrows down outcomes have empowered policymakers to understand COVID-19's impact, before it happens.


Exploiting Emotion on Reviews for Recommender Systems

AAAI Conferences

Review history is widely used by recommender systems to infer users' preferences and help find the potential interests from the huge volumes of data, whereas it also brings in great concerns on the sparsity and cold-start problems due to its inadequacy. Psychology and sociology research has shown that emotion information is a strong indicator for users' preferences. Meanwhile, with the fast development of online services, users are willing to express their emotion on others' reviews, which makes the emotion information pervasively available. Besides, recent research shows that the number of emotion on reviews is always much larger than the number of reviews. Therefore incorporating emotion on reviews may help to alleviate the data sparsity and cold-start problems for recommender systems. In this paper, we provide a principled and mathematical way to exploit both positive and negative emotion on reviews, and propose a novel framework MIRROR, exploiting eMotIon on Reviews for RecOmmendeR systems from both global and local perspectives. Empirical results on real-world datasets demonstrate the effectiveness of our proposed framework and further experiments are conducted to understand how emotion on reviews works for the proposed framework.