Goto

Collaborating Authors

 Hu, Guimin


Retrieving Implicit and Explicit Emotional Events Using Large Language Models

arXiv.org Artificial Intelligence

Large language models (LLMs) have garnered significant attention in recent years due to their impressive performance. While considerable research has evaluated these models from various perspectives, the extent to which LLMs can perform implicit and explicit emotion retrieval remains largely unexplored. To address this gap, this study investigates LLMs' emotion retrieval capabilities in commonsense. Through extensive experiments involving multiple models, we systematically evaluate the ability of LLMs on emotion retrieval. Specifically, we propose a supervised contrastive probing method to verify LLMs' performance for implicit and explicit emotion retrieval, as well as the diversity of the emotional events they retrieve. The results offer valuable insights into the strengths and limitations of LLMs in handling emotion retrieval.


Grounding Emotional Descriptions to Electrovibration Haptic Signals

arXiv.org Artificial Intelligence

Designing and displaying haptic signals with sensory and emotional attributes can improve the user experience in various applications. Free-form user language provides rich sensory and emotional information for haptic design (e.g., ``This signal feels smooth and exciting''), but little work exists on linking user descriptions to haptic signals (i.e., language grounding). To address this gap, we conducted a study where 12 users described the feel of 32 signals perceived on a surface haptics (i.e., electrovibration) display. We developed a computational pipeline using natural language processing (NLP) techniques, such as GPT-3.5 Turbo and word embedding methods, to extract sensory and emotional keywords and group them into semantic clusters (i.e., concepts). We linked the keyword clusters to haptic signal features (e.g., pulse count) using correlation analysis. The proposed pipeline demonstrates the viability of a computational approach to analyzing haptic experiences. We discuss our future plans for creating a predictive model of haptic experience.


FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture

arXiv.org Artificial Intelligence

Beijing Chaoshan Food is a rich and varied dimension of cultural heritage, crucial to both individuals and social groups. To bridge the gap in the literature on the often-overlooked regional diversity in this domain, we introduce FoodieQA, a manually curated, fine-grained image-text dataset capturing the intricate features of food cultures across various regions in China. We evaluate vision-language Models (VLMs) and large language models (LLMs) on newly collected, unseen food images and corresponding questions. FoodieQA comprises three multiplechoice question-answering tasks where models need to answer questions based on multiple images, Sichuan Guangdong a single image, and text-only descriptions, Figure 1: An example of regional food differences in respectively. While LLMs excel at text-based referring to hotpot in China. The depicted soups and question answering, surpassing human accuracy, dishware visually reflect the ingredients, flavors, and the open-weights VLMs still fall short by traditions of these regions: Beijing in the north, Sichuan 41% on multi-image and 21% on single-image in the southwest, and Guangdong in the south coast. VQA tasks, although closed-weights models perform closer to human levels (within 10%).


UniMEEC: Towards Unified Multimodal Emotion Recognition and Emotion Cause

arXiv.org Artificial Intelligence

Multimodal emotion recognition in conversation (MERC) and multimodal emotion-cause pair extraction (MECPE) has recently garnered significant attention. Emotions are the expression of affect or feelings; responses to specific events, thoughts, or situations are known as emotion causes. Both are like two sides of a coin, collectively describing human behaviors and intents. However, most existing works treat MERC and MECPE as separate tasks, which may result in potential challenges in integrating emotion and cause in real-world applications. In this paper, we propose a Unified Multimodal Emotion recognition and Emotion-Cause analysis framework (UniMEEC) to explore the causality and complementarity between emotion and emotion cause. Concretely, UniMEEC reformulates the MERC and MECPE tasks as two mask prediction problems, enhancing the interaction between emotion and cause. Meanwhile, UniMEEC shares the prompt learning among modalities for probing modality-specific knowledge from the Pre-trained model. Furthermore, we propose a task-specific hierarchical context aggregation to control the information flow to the task. Experiment results on four public benchmark datasets verify the model performance on MERC and MECPE tasks and achieve consistent improvements compared with state-of-the-art methods.


Emotion Prediction Oriented method with Multiple Supervisions for Emotion-Cause Pair Extraction

arXiv.org Artificial Intelligence

Emotion-cause pair extraction (ECPE) task aims to extract all the pairs of emotions and their causes from an unannotated emotion text. The previous works usually extract the emotion-cause pairs from two perspectives of emotion and cause. However, emotion extraction is more crucial to the ECPE task than cause extraction. Motivated by this analysis, we propose an end-to-end emotion-cause extraction approach oriented toward emotion prediction (EPO-ECPE), aiming to fully exploit the potential of emotion prediction to enhance emotion-cause pair extraction. Considering the strong dependence between emotion prediction and emotion-cause pair extraction, we propose a synchronization mechanism to share their improvement in the training process. That is, the improvement of emotion prediction can facilitate the emotion-cause pair extraction, and then the results of emotion-cause pair extraction can also be used to improve the accuracy of emotion prediction simultaneously. For the emotion-cause pair extraction, we divide it into genuine pair supervision and fake pair supervision, where the genuine pair supervision learns from the pairs with more possibility to be emotion-cause pairs. In contrast, fake pair supervision learns from other pairs. In this way, the emotion-cause pairs can be extracted directly from the genuine pair, thereby reducing the difficulty of extraction. Experimental results show that our approach outperforms the 13 compared systems and achieves new state-of-the-art performance.


UniMSE: Towards Unified Multimodal Sentiment Analysis and Emotion Recognition

arXiv.org Artificial Intelligence

Multimodal sentiment analysis (MSA) and emotion recognition in conversation (ERC) are key research topics for computers to understand human behaviors. From a psychological perspective, emotions are the expression of affect or feelings during a short period, while sentiments are formed and held for a longer period. However, most existing works study sentiment and emotion separately and do not fully exploit the complementary knowledge behind the two. In this paper, we propose a multimodal sentiment knowledge-sharing framework (UniMSE) that unifies MSA and ERC tasks from features, labels, and models. We perform modality fusion at the syntactic and semantic levels and introduce contrastive learning between modalities and samples to better capture the difference and consistency between sentiments and emotions. Experiments on four public benchmark datasets, MOSI, MOSEI, MELD, and IEMOCAP, demonstrate the effectiveness of the proposed method and achieve consistent improvements compared with state-of-the-art methods.