emotion classifier
Can LLMs Faithfully Explain Themselves in Low-Resource Languages? A Case Study on Emotion Detection in Persian
Mehrazar, Mobina, Yousefi, Mohammad Amin, Beygi, Parisa Abolfath, Bahrak, Behnam
Large language models (LLMs) are increasingly used to generate self-explanations alongside their predictions, a practice that raises concerns about the faithfulness of these explanations, especially in low-resource languages. This study evaluates the faithfulness of LLM-generated explanations in the context of emotion classification in Persian, a low-resource language, by comparing the influential words identified by the model against those identified by human annotators. We assess faithfulness using confidence scores derived from token-level log-probabilities. Two prompting strategies, differing in the order of explanation and prediction (Predict-then-Explain and Explain-then-Predict), are tested for their impact on explanation faithfulness. Our results reveal that while LLMs achieve strong classification performance, their generated explanations often diverge from faithful reasoning, showing greater agreement with each other than with human judgments. These results highlight the limitations of current explanation methods and metrics, emphasizing the need for more robust approaches to ensure LLM reliability in multilingual and low-resource contexts.
- North America > United States (0.28)
- North America > Mexico (0.28)
- North America > Canada (0.28)
EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance
Guo, Yiwei, Du, Chenpeng, Chen, Xie, Yu, Kai
Although current neural text-to-speech (TTS) models are able to generate high-quality speech, intensity controllable emotional TTS is still a challenging task. Most existing methods need external optimizations for intensity calculation, leading to suboptimal results or degraded quality. In this paper, we propose EmoDiff, a diffusion-based TTS model where emotion intensity can be manipulated by a proposed soft-label guidance technique derived from classifier guidance. Specifically, instead of being guided with a one-hot vector for the specified emotion, EmoDiff is guided with a soft label where the value of the specified emotion and \textit{Neutral} is set to $\alpha$ and $1-\alpha$ respectively. The $\alpha$ here represents the emotion intensity and can be chosen from 0 to 1. Our experiments show that EmoDiff can precisely control the emotion intensity while maintaining high voice quality. Moreover, diverse speech with specified emotion intensity can be generated by sampling in the reverse denoising process.
Controlling Perceived Emotion in Symbolic Music Generation with Monte Carlo Tree Search
Ferreira, Lucas N., Mou, Lili, Whitehead, Jim, Lelis, Levi H. S.
This paper presents a new approach for controlling emotion in symbolic music generation with Monte Carlo Tree Search. We use Monte Carlo Tree Search as a decoding mechanism to steer the probability distribution learned by a language model towards a given emotion. At every step of the decoding process, we use Predictor Upper Confidence for Trees (PUCT) to search for sequences that maximize the average values of emotion and quality as given by an emotion classifier and a discriminator, respectively. We use a language model as PUCT's policy and a combination of the emotion classifier and the discriminator as its value function. To decode the next token in a piece of music, we sample from the distribution of node visits created during the search. We evaluate the quality of the generated samples with respect to human-composed pieces using a set of objective metrics computed directly from the generated samples. We also perform a user study to evaluate how human subjects perceive the generated samples' quality and emotion. We compare PUCT against Stochastic Bi-Objective Beam Search (SBBS) and Conditional Sampling (CS). Results suggest that PUCT outperforms SBBS and CS in almost all metrics of music quality and emotion.
- North America > Canada > Alberta (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
- Research Report > New Finding (0.88)
- Research Report > Experimental Study (0.68)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
Controlled Cue Generation for Play Scripts
Dirik, Alara, Donmez, Hilal, Yanardag, Pinar
In this paper, we use a large-scale play scripts dataset to propose the novel task of theatrical cue generation from dialogues. Using over one million lines of dialogue and cues, we approach the problem of cue generation as a controlled text generation task, and show how cues can be used to enhance the impact of dialogue using a language model conditioned on a dialogue/cue discriminator. In addition, we explore the use of topic keywords and emotions for controlled text generation. Extensive quantitative and qualitative experiments show that language models can be successfully used to generate plausible and attribute-controlled texts in highly specialised domains such as play scripts.
- Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Asia > Middle East > Jordan (0.04)
Evaluator for Emotionally Consistent Chatbots
Liu, Chenxiao, Deng, Guanzhi, Ji, Tao, Tang, Difei, Zheng, Silai
One challenge for evaluating current In this research, we aim to train an evaluator sequence-or dialogue-level chatbots, that can effectively evaluate the emotional such as Empathetic Open-domain consistency of chatbots. Conversation Models, is to determine whether the chatbot performs in an 1.2 Related Work emotionally consistent way. The most recent work only evaluates on the Empathetic dialogues There are studies aspects of context coherence, language (Rashkin et al., 2019; Li et al., 2017; Zhou fluency, response diversity, or logical et al., 2018; Sheen, 2021) that provide self-consistency between dialogues.
MEDCOD: A Medically-Accurate, Emotive, Diverse, and Controllable Dialog System
Compton, Rhys, Valmianski, Ilya, Deng, Li, Huang, Costa, Katariya, Namit, Amatriain, Xavier, Kannan, Anitha
We present MEDCOD, a Medically-Accurate, Emotive, Diverse, and Controllable Dialog system with a unique approach to the natural language generator module. MEDCOD has been developed and evaluated specifically for the history taking task. It integrates the advantage of a traditional modular approach to incorporate (medical) domain knowledge with modern deep learning techniques to generate flexible, human-like natural language expressions. Two key aspects of MEDCOD's natural language output are described in detail. First, the generated sentences are emotive and empathetic, similar to how a doctor would communicate to the patient. Second, the generated sentence structures and phrasings are varied and diverse while maintaining medical consistency with the desired medical concept (provided by the dialogue manager module of MEDCOD). Experimental results demonstrate the effectiveness of our approach in creating a human-like medical dialogue system. Relevant code is available at https://github.com/curai/curai-research/tree/main/MEDCOD
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.68)
- Health & Medicine > Health Care Technology > Telehealth (0.46)
Unsupervised Cross-Lingual Speech Emotion Recognition Using DomainAdversarial Neural Network
Cai, Xiong, Wu, Zhiyong, Zhong, Kuo, Su, Bin, Dai, Dongyang, Meng, Helen
By using deep learning approaches, Speech Emotion Recog-nition (SER) on a single domain has achieved many excellentresults. However, cross-domain SER is still a challenging taskdue to the distribution shift between source and target domains.In this work, we propose a Domain Adversarial Neural Net-work (DANN) based approach to mitigate this distribution shiftproblem for cross-lingual SER. Specifically, we add a languageclassifier and gradient reversal layer after the feature extractor toforce the learned representation both language-independent andemotion-meaningful. Our method is unsupervised, i. e., labelson target language are not required, which makes it easier to ap-ply our method to other languages. Experimental results showthe proposed method provides an average absolute improve-ment of 3.91% over the baseline system for arousal and valenceclassification task. Furthermore, we find that batch normaliza-tion is beneficial to the performance gain of DANN. Thereforewe also explore the effect of different ways of data combinationfor batch normalization.
CARE: Commonsense-Aware Emotional Response Generation with Latent Concepts
Zhong, Peixiang, Wang, Di, Li, Pengfei, Zhang, Chen, Wang, Hao, Miao, Chunyan
Rationality and emotion are two fundamental elements of humans. Endowing agents with rationality and emotion has been one of the major milestones in AI. However, in the field of conversational AI, most existing models only specialize in one aspect and neglect the other, which often leads to dull or unrelated responses. In this paper, we hypothesize that combining rationality and emotion into conversational agents can improve response quality. To test the hypothesis, we focus on one fundamental aspect of rationality, i.e., commonsense, and propose CARE, a novel model for commonsense-aware emotional response generation. Specifically, we first propose a framework to learn and construct commonsense-aware emotional latent concepts of the response given an input message and a desired emotion. We then propose three methods to collaboratively incorporate the latent concepts into response generation. Experimental results on two large-scale datasets support our hypothesis and show that our model can produce more accurate and commonsense-aware emotional responses and achieve better human ratings than state-of-the-art models that only specialize in one aspect.
- Asia > Singapore (0.05)
- North America > United States (0.04)
- Asia > China (0.04)
- Research Report > Promising Solution (0.68)
- Research Report > Experimental Study (0.48)
Computer-Generated Music for Tabletop Role-Playing Games
Ferreira, Lucas N., Lelis, Levi H. S., Whitehead, Jim
In this paper we present Bardo Composer, a system to generate background music for tabletop role-playing games. Bardo Composer uses a speech recognition system to translate player speech into text, which is classified according to a model of emotion. Bardo Composer then uses Stochastic Bi-Objective Beam Search, a variant of Stochastic Beam Search that we introduce in this paper, with a neural model to generate musical pieces conveying the desired emotion. We performed a user study with 116 participants to evaluate whether people are able to correctly identify the emotion conveyed in the pieces generated by the system. In our study we used pieces generated for Call of the Wild, a Dungeons and Dragons campaign available on YouTube. Our results show that human subjects could correctly identify the emotion of the generated music pieces as accurately as they were able to identify the emotion of pieces written by humans.
- North America > Canada > Alberta (0.14)
- North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Media > Music (1.00)
- Leisure & Entertainment > Games > Computer Games (1.00)
Emotional AI platform reveals that smiles are down 32% due to COVID-19 - ClickZ
Headquartered in London, Realeyes is an eye-tracking and emotion measurement platform that uses AI and machine learning to gain insight into human behavior and expression. Their clients include Buzzfeed, Coca-Cola, Conde Nast, eBay, Mars, and Publicis Groupe, among others. Realeyes uses front-facing cameras, computer vision and machine learning technologies to detect attention and emotion among opt-in audiences as they watch video content. ClickZ recently spoke with Max Kalehoff, VP of Marketing & Growth for Realeyes to discuss the company's innovative technology and the capabilities they bring to marketers and publishers. Kalehoff learned about Realeyes after co-leading a panel presentation with Realeyes's CEO at the Sustainable Brands Conference in 2017.