AITopics | action unit

Collaborating Authors

action unit

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Saliency-guided Emotion Modeling: Predicting Viewer Reactions from Video Stimuli

Yaragoppa, Akhila, Siddharth, null

arXiv.org Artificial IntelligenceMay-27-2025

Understanding the emotional impact of videos is crucial for applications in content creation, advertising, and Human-Computer Interaction (HCI). Traditional affective computing methods rely on self-reported emotions, facial expression analysis, and biosensing data, yet they often overlook the role of visual saliency -- the naturally attention-grabbing regions within a video. In this study, we utilize deep learning to introduce a novel saliency-based approach to emotion prediction by extracting two key features: saliency area and number of salient regions. Using the HD2S saliency model and OpenFace facial action unit analysis, we examine the relationship between video saliency and viewer emotions. Our findings reveal three key insights: (1) Videos with multiple salient regions tend to elicit high-valence, low-arousal emotions, (2) Videos with a single dominant salient region are more likely to induce low-valence, high-arousal responses, and (3) Self-reported emotions often misalign with facial expression-based emotion detection, suggesting limitations in subjective reporting. By leveraging saliency-driven insights, this work provides a computationally efficient and interpretable alternative for emotion modeling, with implications for content creation, personalized media experiences, and affective computing research.

artificial intelligence, machine learning, salient region, (19 more...)

arXiv.org Artificial Intelligence

2505.19178

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsFeb-6-2025, 08:38:54 GMT

In this paper the authors propose a novel recurrent convolutional encoder-decoder network for learning to apply out-of-plane rotations to 3d objects such as human faces and 3d chair models. The proposed network starts from a basic model, where its encoder network disentangles the input image into identity units and pose units, then with the action units applied on pose units to control the rotation direction, its decoder network which consists of convolution and unsampling decode the identity and pose into an image of rotated object and the corresponding object mask. To support longer rotation trajectories, the proposed network is then extended to have the recurrent architecture where the encoded identity unit of input image is fixed and the pose unit is changed by a sequence of action units, and finally both identity and pose units are fed into decoder to generate the result image. One of main contribution of this paper is learning to disentangle the representations for identity/appearance and pose factors, where the identity units are shown to be a discriminative view-invariant features in the cross-view object recognition task. In addition, this disentangling properties will benefit more and predict better rendering while using the longer rotation trajectories in the curriculum training stages for training the proposed recurrent convolutional encoder-decoder network.

author feedback and meta-review, pose unit, recurrent convolutional encoder-decoder network, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.36)

Add feedback

Fusion in Context: A Multimodal Approach to Affective State Recognition

Mohamed, Youssef, Lemaignan, Severin, Guneysu, Arzu, Jensfelt, Patric, Smith, Christian

arXiv.org Artificial IntelligenceSep-18-2024

Accurate recognition of human emotions is a crucial challenge in affective computing and human-robot interaction (HRI). Emotional states play a vital role in shaping behaviors, decisions, and social interactions. However, emotional expressions can be influenced by contextual factors, leading to misinterpretations if context is not considered. Multimodal fusion, combining modalities like facial expressions, speech, and physiological signals, has shown promise in improving affect recognition. This paper proposes a transformer-based multimodal fusion approach that leverages facial thermal data, facial action units, and textual context information for context-aware emotion recognition. We explore modality-specific encoders to learn tailored representations, which are then fused using additive fusion and processed by a shared transformer encoder to capture temporal dependencies and interactions. The proposed method is evaluated on a dataset collected from participants engaged in a tangible tabletop Pacman game designed to induce various affective states. Our results demonstrate the effectiveness of incorporating contextual information and multimodal fusion for affective state recognition.

expression, modality, recognition, (15 more...)

arXiv.org Artificial Intelligence

2409.11906

Country:

Europe > Sweden > Västerbotten County > Umeå (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Using Quantum Solved Deep Boltzmann Machines to Increase the Data Efficiency of RL Agents

Kent, Daniel, O'Rourke, Clement, Southall, Jake, Duncan, Kirsty, Bedford, Adrian

arXiv.org Artificial IntelligenceAug-30-2024

Deep Learning algorithms, such as those used in Reinforcement Learning, often require large quantities of data to train effectively. In most cases, the availability of data is not a significant issue. However, for some contexts, such as in autonomous cyber defence, we require data efficient methods. Recently, Quantum Machine Learning and Boltzmann Machines have been proposed as solutions to this challenge. In this work we build upon the pre-existing work to extend the use of Deep Boltzmann Machines to the cutting edge algorithm Proximal Policy Optimisation in a Reinforcement Learning cyber defence environment. We show that this approach, when solved using a D-WAVE quantum annealer, can lead to a two-fold increase in data efficiency. We therefore expect it to be used by the machine learning and quantum communities who are hoping to capitalise on data-efficient Reinforcement Learning methods.

agent, boltzmann machine, neural network, (14 more...)

arXiv.org Artificial Intelligence

2408.1724

Country:

Europe > United Kingdom > England > North Yorkshire > Middlesbrough (0.04)
Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)
Europe > United Kingdom > England > Bristol (0.04)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.85)

Add feedback

InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation

Wang, Yuchi, Guo, Junliang, Bai, Jianhong, Yu, Runyi, He, Tianyu, Tan, Xu, Sun, Xu, Bian, Jiang

arXiv.org Artificial IntelligenceMay-24-2024

Recent talking avatar generation models have made strides in achieving realistic and accurate lip synchronization with the audio, but often fall short in controlling and conveying detailed expressions and emotions of the avatar, making the generated video less vivid and controllable. In this paper, we propose a novel text-guided approach for generating emotionally expressive 2D avatars, offering fine-grained control, improved interactivity, and generalizability to the resulting video. Our framework, named InstructAvatar, leverages a natural language interface to control the emotion as well as the facial motion of avatars. Technically, we design an automatic annotation pipeline to construct an instruction-video paired training dataset, equipped with a novel two-branch diffusion-based generator to predict avatars with audio and text instructions at the same time. Experimental results demonstrate that InstructAvatar produces results that align well with both conditions, and outperforms existing methods in fine-grained emotion control, lip-sync quality, and naturalness. Our project page is https://wangyuchi369.github.io/InstructAvatar/.

emotion, instruction, video, (14 more...)

arXiv.org Artificial Intelligence

2405.15758

Country: Asia > Taiwan > Taiwan Province > Taipei (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Unmasking Parkinson's Disease with Smile: An AI-enabled Screening Framework

Adnan, Tariq, Islam, Md Saiful, Rahman, Wasifur, Lee, Sangwu, Tithi, Sutapa Dey, Noshin, Kazi, Sarker, Imran, Rahman, M Saifur, Hoque, Ehsan

arXiv.org Artificial IntelligenceAug-3-2023

Parkinson's disease (PD) diagnosis remains challenging due to lacking a reliable biomarker and limited access to clinical care. In this study, we present an analysis of the largest video dataset containing micro-expressions to screen for PD. We collected 3,871 videos from 1,059 unique participants, including 256 self-reported PD patients. The recordings are from diverse sources encompassing participants' homes across multiple countries, a clinic, and a PD care facility in the US. Leveraging facial landmarks and action units, we extracted features relevant to Hypomimia, a prominent symptom of PD characterized by reduced facial expressions. An ensemble of AI models trained on these features achieved an accuracy of 89.7% and an Area Under the Receiver Operating Characteristic (AUROC) of 89.3% while being free from detectable bias across population subgroups based on sex and ethnicity on held-out data. Further analysis reveals that features from the smiling videos alone lead to comparable performance, even on two external test sets the model has never seen during training, suggesting the potential for PD risk assessment from smiling selfie videos.

artificial intelligence, machine learning, participant, (19 more...)

arXiv.org Artificial Intelligence

2308.02588

Country:

Asia > Bangladesh (0.06)
North America > United States > Ohio (0.04)
North America > United States > Alaska (0.04)
(9 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Parkinson's Disease (1.00)
Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

TextMI: Textualize Multimodal Information for Integrating Non-verbal Cues in Pre-trained Language Models

Hasan, Md Kamrul, Islam, Md Saiful, Lee, Sangwu, Rahman, Wasifur, Naim, Iftekhar, Khan, Mohammed Ibrahim, Hoque, Ehsan

arXiv.org Artificial IntelligenceMar-29-2023

Pre-trained large language models have recently achieved ground-breaking performance in a wide variety of language understanding tasks. However, the same model can not be applied to multimodal behavior understanding tasks (e.g., video sentiment/humor detection) unless non-verbal features (e.g., acoustic and visual) can be integrated with language. Jointly modeling multiple modalities significantly increases the model complexity, and makes the training process data-hungry. While an enormous amount of text data is available via the web, collecting large-scale multimodal behavioral video datasets is extremely expensive, both in terms of time and money. In this paper, we investigate whether large language models alone can successfully incorporate non-verbal information when they are presented in textual form. We present a way to convert the acoustic and visual information into corresponding textual descriptions and concatenate them with the spoken text. We feed this augmented input to a pre-trained BERT model and fine-tune it on three downstream multimodal tasks: sentiment, humor, and sarcasm detection. Our approach, TextMI, significantly reduces model complexity, adds interpretability to the model's decision, and can be applied for a diverse set of tasks while achieving superior (multimodal sarcasm detection) or near SOTA (multimodal sentiment analysis and multimodal humor detection) performance. We propose TextMI as a general, competitive baseline for multimodal behavioral analysis tasks, particularly in a low-resource setting.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2303.1543

Country:

Africa > Eswatini > Manzini > Manzini (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

ABAW: Valence-Arousal Estimation, Expression Recognition, Action Unit Detection & Emotional Reaction Intensity Estimation Challenges

Kollias, Dimitrios, Tzirakis, Panagiotis, Baird, Alice, Cowen, Alan, Zafeiriou, Stefanos

arXiv.org Artificial IntelligenceMar-20-2023

The fifth Affective Behavior Analysis in-the-wild (ABAW) Competition is part of the respective ABAW Workshop which will be held in conjunction with IEEE Computer Vision and Pattern Recognition Conference (CVPR), 2023. The 5th ABAW Competition is a continuation of the Competitions held at ECCV 2022, IEEE CVPR 2022, ICCV 2021, IEEE FG 2020 and CVPR 2017 Conferences, and is dedicated at automatically analyzing affect. For this year's Competition, we feature two corpora: i) an extended version of the Aff-Wild2 database and ii) the Hume-Reaction dataset. The former database is an audiovisual one of around 600 videos of around 3M frames and is annotated with respect to:a) two continuous affect dimensions -valence (how positive/negative a person is) and arousal (how active/passive a person is)-; b) basic expressions (e.g. happiness, sadness, neutral state); and c) atomic facial muscle actions (i.e., action units). The latter dataset is an audiovisual one in which reactions of individuals to emotional stimuli have been annotated with respect to seven emotional expression intensities. Thus the 5th ABAW Competition encompasses four Challenges: i) uni-task Valence-Arousal Estimation, ii) uni-task Expression Classification, iii) uni-task Action Unit Detection, and iv) Emotional Reaction Intensity Estimation. In this paper, we present these Challenges, along with their corpora, we outline the evaluation metrics, we present the baseline systems and illustrate their obtained performance.

artificial intelligence, dimitrio kollia, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2303.01498

Country:

North America > United States > New York > New York County > New York City (0.05)
Africa (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.83)

Add feedback

ForDigitStress: A multi-modal stress dataset employing a digital job interview scenario

Heimerl, Alexander, Prajod, Pooja, Mertes, Silvan, Baur, Tobias, Kraus, Matthias, Liu, Ailin, Risack, Helen, Rohleder, Nicolas, André, Elisabeth, Becker, Linda

arXiv.org Artificial IntelligenceMar-14-2023

We present a multi-modal stress dataset that uses digital job interviews to induce stress. The dataset provides multi-modal data of 40 participants including audio, video (motion capturing, facial recognition, eye tracking) as well as physiological information (photoplethysmography, electrodermal activity). In addition to that, the dataset contains time-continuous annotations for stress and occurred emotions (e.g. shame, anger, anxiety, surprise). In order to establish a baseline, five different machine learning classifiers (Support Vector Machine, K-Nearest Neighbors, Random Forest, Long-Short-Term Memory Network) have been trained and evaluated on the proposed dataset for a binary stress classification task. The best-performing classifier achieved an accuracy of 88.3% and an F1-score of 87.5%.

artificial intelligence, machine learning, participant, (20 more...)

arXiv.org Artificial Intelligence

2303.07742

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre:

Questionnaire & Opinion Survey (0.93)
Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Technology (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback

Using Positive Matching Contrastive Loss with Facial Action Units to mitigate bias in Facial Expression Recognition

Suresh, Varsha, Ong, Desmond C.

arXiv.org Artificial IntelligenceMar-8-2023

Machine learning models automatically learn discriminative features from the data, and are therefore susceptible to learn strongly-correlated biases, such as using protected attributes like gender and race. Most existing bias mitigation approaches aim to explicitly reduce the model's focus on these protected features. In this work, we propose to mitigate bias by explicitly guiding the model's focus towards task-relevant features using domain knowledge, and we hypothesize that this can indirectly reduce the dependence of the model on spurious correlations it learns from the data. We explore bias mitigation in facial expression recognition systems using facial Action Units (AUs) as the task-relevant feature. To this end, we introduce Feature-based Positive Matching Contrastive Loss which learns the distances between the positives of a sample based on the similarity between their corresponding AU embeddings. We compare our approach with representative baselines and show that incorporating task-relevant features via our method can improve model fairness at minimal cost to classification performance.

artificial intelligence, expression recognition, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ACII55700.2022.9953865

2303.04896

Country:

Asia > Singapore (0.05)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback