The one and only reason why businesses are turning to automatic emotion detection is you! Emotion sensing technologies are expanding exponentially. Market researchers estimate the Emotion Detection & Recognition (EDR) business to grow at a compound annual growth rate (CAGR) of 27.20–39.9%, One of the most common ways to automatically recognize emotions is via facial detection in photos and videos. The list of softwares or APIs that allow you to do that keeps on getting longer.
While several approaches to face emotion recognition task are proposed in literature, none of them reports on power consumption nor inference time required to run the system in an embedded environment. Without adequate knowledge about these factors it is not clear whether we are actually able to provide accurate face emotion recognition in the embedded environment or not, and if not, how far we are from making it feasible and what are the biggest bottlenecks we face. The main goal of this paper is to answer these questions and to convey the message that instead of reporting only detection accuracy also power consumption and inference time should be reported as real usability of the proposed systems and their adoption in human computer interaction strongly depends on it. In this paper, we identify the state-of-the art face emotion recognition methods that are potentially suitable for embedded environment and the most frequently used datasets for this task. Our study shows that most of the performed experiments use datasets with posed expressions or in a particular experimental setup with special conditions for image collection. Since our goal is to evaluate the performance of the identified promising methods in the realistic scenario, we collect a new dataset with non-exaggerated emotions and we use it, in addition to the publicly available datasets, for the evaluation of detection accuracy, power consumption and inference time on three frequently used embedded devices with different computational capabilities. Our results show that gray images are still more suitable for embedded environment than color ones and that for most of the analyzed systems either inference time or energy consumption or both are limiting factor for their adoption in real-life embedded applications.
The final step for many artificial intelligence (AI) researchers is the development of a system that can identify human emotion from voice and facial expressions. While some facial scanning technology is available, there is still a long way to go in terms of properly identifying emotional states due to the complexity of nuances in speech as well as facial muscle movement. The University of Science and Technology researchers in Hefei, China, believe that they have made a breakthrough. Their paper, "Deep Fusion: An Attention Guided Factorized Bilinear Pooling for Audio-video Emotion Recognition," expresses how an AI system may be able to recognize human emotion through state-of-the-art accuracy on a popular benchmark. In their published paper, the researchers say, "Automatic emotion recognition (AER) is a challenging task due to the abstract concept and multiple expressions of emotion. Inspired by this cognitive process in human beings, it's natural to simultaneously utilize audio and visual information in AER … The whole pipeline can be completed in a neural network."
Emotion recognition has become an important field of research in Human Computer Interactions as we improve upon the techniques for modelling the various aspects of behaviour. With the advancement of technology our understanding of emotions are advancing, there is a growing need for automatic emotion recognition systems. One of the directions the research is heading is the use of Neural Networks which are adept at estimating complex functions that depend on a large number and diverse source of input data. In this paper we attempt to exploit this effectiveness of Neural networks to enable us to perform multimodal Emotion recognition on IEMOCAP dataset using data from Speech, Text, and Motion capture data from face expressions, rotation and hand movements. Prior research has concentrated on Emotion detection from Speech on the IEMOCAP dataset, but our approach is the first that uses the multiple modes of data offered by IEMOCAP for a more robust and accurate emotion detection.
In this paper, we propose a Concept-level Emotion Cause Model (CECM), instead of the mere word-level models, to discover causes of microblogging users' diversified emotions on specific hot event. A modified topic-supervised biterm topic model is utilized in CECM to detect emotion topics' in event-related tweets, and then context-sensitive topical PageRank is utilized to detect meaningful multiword expressions as emotion causes. Experimental results on a dataset from Sina Weibo, one of the largest microblogging websites in China, show CECM can better detect emotion causes than baseline methods.