aus
Incremental Boosting Convolutional Neural Network for Facial Action Unit Recognition
Shizhong Han, Zibo Meng, AHMED-SHEHAB KHAN, Yan Tong
Recognizing facial action units (AUs) from spontaneous fac ial expressions is still a challenging problem. Most recently, CNNs have shown promi se on facial AU recognition. However, the learned CNNs are often overfitted and do not generalize well to unseen subjects due to limited AU-coded traini ng images. W e proposed a novel Incremental Boosting CNN (IB-CNN) to integrat e boosting into the CNN via an incremental boosting layer that selects discr iminative neurons from the lower layer and is incrementally updated on success ive mini-batches. In addition, a novel loss function that accounts for errors fro m both the incremental boosted classifier and individual weak classifiers was pr oposed to fine-tune the IB-CNN. Experimental results on four benchmark AU datab ases have demonstrated that the IB-CNN yields significant improvement over the traditional CNN and the boosting CNN without incremental learning, as well a s outperforming the state-of-the-art CNN-based methods in AU recognition. The improvement is more impressive for the AUs that have the lowest frequencies in th e databases.
- North America > United States > South Carolina > Richland County > Columbia (0.14)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- North America > United States > Maine (0.04)
- (2 more...)
Augmenting Dialog with Think-Aloud Utterances for Modeling Individual Personality Traits by LLM
Ishikura, Seiya, Yamada, Hiroaki, Hiraoka, Tatsuya, Yamada, Hiroaki, Tokunaga, Takenobu
This study proposes augmenting dialog data with think-aloud utterances (TAUs) for modeling individual personalities in text chat by LLM. TAU is a verbalization of a speaker's thought before articulating the utterance. We expect "persona LLMs" trained with TAU-augmented data can mimic the speaker's personality trait better. We tested whether the trained persona LLMs obtain the human personality with respect to Big Five, a framework characterizing human personality traits from five aspects. The results showed that LLMs trained with TAU-augmented data more closely align to the speakers' Agreeableness and Neuroticism of Big Five than those trained with original dialog data. We also found that the quality of TAU-augmentation impacts persona LLM's performance.
- Europe > Austria > Vienna (0.14)
- North America > United States > Pennsylvania (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- (8 more...)
The Behavioural Translation Style Space: Towards simulating the temporal dynamics of affect, behaviour, and cognition in human translation production
Carl, Michael, Mizowaki, Takanori, Ray, Aishvarya, Yamada, Masaru, Bandaru, Devi Sri, Ren, Xinyue
The paper introduces a novel behavioural translation style space (BTSS) that describes possible behavioural translation patterns. The suggested BTSS is organized as a hierarchical structure that entails various embedded processing layers. We posit that observable translation behaviour - i.e. eye and finger movements - is fundamental when executing the physical act of translation but it is caused and shaped by higher-order cognitive processes and affective translation states. We analyse records of keystrokes and gaze data as indicators of the hidden mental processing structure and organize the behavioural patterns as a multi-layered embedded BTSS. We develop a perspective in which the BTSS serves as the basis for a computational translation agent to simulate the temporal dynamics of affect, behavioural routines and cognition during human translation production.
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > United States > New York (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- (9 more...)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (3 more...)
A Generic Knowledge as Probabilities
We adapt the generic knowledge from existing studies that are applicable to different datasets. Generic knowledge is expressed as probabilities. The generic knowledge is categorized into three types: expression-dependent single AU probabilities, expression-dependent joint AU probabilities, and expression-independent joint AU probabilities. 1) For expression-dependent single AU probabilities, two sources are considered. According to FACS, given an expression, AUs can be grouped into primary (P) and secondary (S) categories. The primary AUs are the most expressive AUs with respective to the expression, and the secondary AUs may co-occur with primary AUs providing additional supports for the expression.
Saliency-guided Emotion Modeling: Predicting Viewer Reactions from Video Stimuli
Yaragoppa, Akhila, Siddharth, null
Understanding the emotional impact of videos is crucial for applications in content creation, advertising, and Human-Computer Interaction (HCI). Traditional affective computing methods rely on self-reported emotions, facial expression analysis, and biosensing data, yet they often overlook the role of visual saliency -- the naturally attention-grabbing regions within a video. In this study, we utilize deep learning to introduce a novel saliency-based approach to emotion prediction by extracting two key features: saliency area and number of salient regions. Using the HD2S saliency model and OpenFace facial action unit analysis, we examine the relationship between video saliency and viewer emotions. Our findings reveal three key insights: (1) Videos with multiple salient regions tend to elicit high-valence, low-arousal emotions, (2) Videos with a single dominant salient region are more likely to induce low-valence, high-arousal responses, and (3) Self-reported emotions often misalign with facial expression-based emotion detection, suggesting limitations in subjective reporting. By leveraging saliency-driven insights, this work provides a computationally efficient and interpretable alternative for emotion modeling, with implications for content creation, personalized media experiences, and affective computing research.
Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control
Chen, Hejia, Zhang, Haoxian, Zhang, Shoulong, Liu, Xiaoqiang, Zhuang, Sisi, Zhang, Yuan, Wan, Pengfei, Zhang, Di, Li, Shuai
Speech-driven 3D talking face method should offer both accurate lip synchronization and controllable expressions. Previous methods solely adopt discrete emotion labels to globally control expressions throughout sequences while limiting flexible fine-grained facial control within the spatiotemporal domain. We propose a diffusion-transformer-based 3D talking face generation model, Cafe-Talk, which simultaneously incorporates coarse- and fine-grained multimodal control conditions. Nevertheless, the entanglement of multiple conditions challenges achieving satisfying performance. To disentangle speech audio and fine-grained conditions, we employ a two-stage training pipeline. Specifically, Cafe-Talk is initially trained using only speech audio and coarse-grained conditions. Then, a proposed fine-grained control adapter gradually adds fine-grained instructions represented by action units (AUs), preventing unfavorable speech-lip synchronization. To disentangle coarse- and fine-grained conditions, we design a swap-label training mechanism, which enables the dominance of the fine-grained conditions. We also devise a mask-based CFG technique to regulate the occurrence and intensity of fine-grained control. In addition, a text-based detector is introduced with text-AU alignment to enable natural language user input and further support multimodal control. Extensive experimental results prove that Cafe-Talk achieves state-of-the-art lip synchronization and expressiveness performance and receives wide acceptance in fine-grained control in user studies. Project page: https://harryxd2018.github.io/cafe-talk/
- Asia (0.28)
- North America > United States > California > San Francisco County > San Francisco (0.14)
Review for NeurIPS paper: Knowledge Augmented Deep Neural Networks for Joint Facial Expression and Action Unit Recognition
Additional Feedback: The work is a good incremental step towards understanding the relationship of AU and FER, and their influence in detecting one over the other. Figure 1: I am assuming that the dotted lines represent back-propagation steps for each module. Please clarify this in the manuscript/Figure. Sec 3.1: The explanation on using the generic knowledge as probabilities is not unique ([b]), and the usage of limited 8 AUs (there are a lot more) is not justified. While generating Table 1, it is important to note that these numbers are taken from studies which explored more AUs than mentioned in the table.
Learning to Control an Android Robot Head for Facial Animation
Heisler, Marcel, Becker-Asano, Christian
The ability to display rich facial expressions is crucial for human-like robotic heads. While manually defining such expressions is intricate, there already exist approaches to automatically learn them. In this work one such approach is applied to evaluate and control a robot head different from the one in the original study. To improve the mapping of facial expressions from human actors onto a robot head, it is proposed to use 3D landmarks and their pairwise distances as input to the learning algorithm instead of the previously used facial action units. Participants of an online survey preferred mappings from our proposed approach in most cases, though there are still further improvements required.
- North America > United States > Colorado > Boulder County > Boulder (0.06)
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.05)
- Asia > China > Shaanxi Province > Xi'an (0.05)
- (5 more...)
Do LLMs Agree on the Creativity Evaluation of Alternative Uses?
Rabeyah, Abdullah Al, Góes, Fabrício, Volpe, Marco, Medeiros, Talles
This paper investigates whether large language models (LLMs) show agreement in assessing creativity in responses to the Alternative Uses Test (AUT). While LLMs are increasingly used to evaluate creative content, previous studies have primarily focused on a single model assessing responses generated by the same model or humans. This paper explores whether LLMs can impartially and accurately evaluate creativity in outputs generated by both themselves and other models. Using an oracle benchmark set of AUT responses, categorized by creativity level (common, creative, and highly creative), we experiment with four state-of-the-art LLMs evaluating these outputs. We test both scoring and ranking methods and employ two evaluation settings (comprehensive and segmented) to examine if LLMs agree on the creativity evaluation of alternative uses. Results reveal high inter-model agreement, with Spearman correlations averaging above 0.7 across models and reaching over 0.77 with respect to the oracle, indicating a high level of agreement and validating the reliability of LLMs in creativity assessment of alternative uses. Notably, models do not favour their own responses, instead they provide similar creativity assessment scores or rankings for alternative uses generated by other models. These findings suggest that LLMs exhibit impartiality and high alignment in creativity evaluation, offering promising implications for their use in automated creativity assessment.
- Europe > United Kingdom > England > Leicestershire > Leicester (0.04)
- North America > Canada > Ontario (0.04)
- Asia > Middle East > Saudi Arabia (0.04)
- (5 more...)