Collaborating Authors


Challenges of Artificial Intelligence -- From Machine Learning and Computer Vision to Emotional Intelligence Artificial Intelligence

Artificial intelligence (AI) has become a part of everyday conversation and our lives. It is considered as the new electricity that is revolutionizing the world. AI is heavily invested in both industry and academy. However, there is also a lot of hype in the current AI debate. AI based on so-called deep learning has achieved impressive results in many problems, but its limits are already visible. AI has been under research since the 1940s, and the industry has seen many ups and downs due to over-expectations and related disappointments that have followed. The purpose of this book is to give a realistic picture of AI, its history, its potential and limitations. We believe that AI is a helper, not a ruler of humans. We begin by describing what AI is and how it has evolved over the decades. After fundamentals, we explain the importance of massive data for the current mainstream of artificial intelligence. The most common representations for AI, methods, and machine learning are covered. In addition, the main application areas are introduced. Computer vision has been central to the development of AI. The book provides a general introduction to computer vision, and includes an exposure to the results and applications of our own research. Emotions are central to human intelligence, but little use has been made in AI. We present the basics of emotional intelligence and our own research on the topic. We discuss super-intelligence that transcends human understanding, explaining why such achievement seems impossible on the basis of present knowledge,and how AI could be improved. Finally, a summary is made of the current state of AI and what to do in the future. In the appendix, we look at the development of AI education, especially from the perspective of contents at our own university.

How Can AI Recognize Pain and Express Empathy Artificial Intelligence

Sensory and emotional experiences such as pain and empathy are relevant to mental and physical health. The current drive for automated pain recognition is motivated by a growing number of healthcare requirements and demands for social interaction make it increasingly essential. Despite being a trending area, they have not been explored in great detail. Over the past decades, behavioral science and neuroscience have uncovered mechanisms that explain the manifestations of pain. Recently, also artificial intelligence research has allowed empathic machine learning methods to be approachable. Generally, the purpose of this paper is to review the current developments for computational pain recognition and artificial empathy implementation. Our discussion covers the following topics: How can AI recognize pain from unimodality and multimodality? Is it necessary for AI to be empathic? How can we create an AI agent with proactive and reactive empathy? This article explores the challenges and opportunities of real-world multimodal pain recognition from a psychological, neuroscientific, and artificial intelligence perspective. Finally, we identify possible future implementations of artificial empathy and analyze how humans might benefit from an AI agent equipped with empathy.

Multi-Classifier Interactive Learning for Ambiguous Speech Emotion Recognition Artificial Intelligence

In recent years, speech emotion recognition technology is of great significance in industrial applications such as call centers, social robots and health care. The combination of speech recognition and speech emotion recognition can improve the feedback efficiency and the quality of service. Thus, the speech emotion recognition has been attracted much attention in both industry and academic. Since emotions existing in an entire utterance may have varied probabilities, speech emotion is likely to be ambiguous, which poses great challenges to recognition tasks. However, previous studies commonly assigned a single-label or multi-label to each utterance in certain. Therefore, their algorithms result in low accuracies because of the inappropriate representation. Inspired by the optimally interacting theory, we address the ambiguous speech emotions by proposing a novel multi-classifier interactive learning (MCIL) method. In MCIL, multiple different classifiers first mimic several individuals, who have inconsistent cognitions of ambiguous emotions, and construct new ambiguous labels (the emotion probability distribution). Then, they are retrained with the new labels to interact with their cognitions. This procedure enables each classifier to learn better representations of ambiguous data from others, and further improves the recognition ability. The experiments on three benchmark corpora (MAS, IEMOCAP, and FAU-AIBO) demonstrate that MCIL does not only improve each classifier's performance, but also raises their recognition consistency from moderate to substantial.

Generating Emotionally Aligned Responses in Dialogues using Affect Control Theory Artificial Intelligence

State-of-the-art neural dialogue systems excel at syntactic and semantic modelling of language, but often have a hard time establishing emotional alignment with the human interactant during a conversation. In this work, we bring Affect Control Theory (ACT), a socio-mathematical model of emotions for human-human interactions, to the neural dialogue generation setting. ACT makes predictions about how humans respond to emotional stimuli in social situations. Due to this property, ACT and its derivative probabilistic models have been successfully deployed in several applications of Human-Computer Interaction, including empathetic tutoring systems, assistive healthcare devices and two-person social dilemma games. We investigate how ACT can be used to develop affect-aware conversational agents, which produce emotionally aligned responses to prompts and take into consideration the affective identities of the interactants.

Exploiting multi-CNN features in CNN-RNN based Dimensional Emotion Recognition on the OMG in-the-wild Dataset Machine Learning

This paper presents a novel CNN-RNN based approach, which exploits multiple CNN features for dimensional emotion recognition in-the-wild, utilizing the One-Minute Gradual-Emotion (OMG-Emotion) dataset. Our approach includes first pre-training with the relevant and large in size, Aff-Wild and Aff-Wild2 emotion databases. Low-, mid- and high-level features are extracted from the trained CNN component and are exploited by RNN subnets in a multi-task framework. Their outputs constitute an intermediate level prediction; final estimates are obtained as the mean or median values of these predictions. Fusion of the networks is also examined for boosting the obtained performance, at Decision-, or at Model-level; in the latter case a RNN was used for the fusion. Our approach, although using only the visual modality, outperformed state-of-the-art methods that utilized audio and visual modalities. Some of our developments have been submitted to the OMG-Emotion Challenge, ranking second among the technologies which used only visual information for valence estimation; ranking third overall. Through extensive experimentation, we further show that arousal estimation is greatly improved when low-level features are combined with high-level ones.

A Personalized Affective Memory Neural Model for Improving Emotion Recognition Artificial Intelligence

Recent models of emotion recognition strongly rely on supervised deep learning solutions for the distinction of general emotion expressions. However, they are not reliable when recognizing online and personalized facial expressions, e.g., for person-specific affective understanding. In this paper, we present a neural model based on a conditional adversarial autoencoder to learn how to represent and edit general emotion expressions. We then propose Grow-When-Required networks as personalized affective memories to learn individualized aspects of emotion expressions. Our model achieves state-of-the-art performance on emotion recognition when evaluated on \textit{in-the-wild} datasets. Furthermore, our experiments include ablation studies and neural visualizations in order to explain the behavior of our model.

Applying Probabilistic Programming to Affective Computing Artificial Intelligence

Affective Computing is a rapidly growing field spurred by advancements in artificial intelligence, but often, held back by the inability to translate psychological theories of emotion into tractable computational models. To address this, we propose a probabilistic programming approach to affective computing, which models psychological-grounded theories as generative models of emotion, and implements them as stochastic, executable computer programs. We first review probabilistic approaches that integrate reasoning about emotions with reasoning about other latent mental states (e.g., beliefs, desires) in context. Recently-developed probabilistic programming languages offer several key desidarata over previous approaches, such as: (i) flexibility in representing emotions and emotional processes; (ii) modularity and compositionality; (iii) integration with deep learning libraries that facilitate efficient inference and learning from large, naturalistic data; and (iv) ease of adoption. Furthermore, using a probabilistic programming framework allows a standardized platform for theory-building and experimentation: Competing theories (e.g., of appraisal or other emotional processes) can be easily compared via modular substitution of code followed by model comparison. To jumpstart adoption, we illustrate our points with executable code that researchers can easily modify for their own models. We end with a discussion of applications and future directions of the probabilistic programming approach.

Unsupervised Learning in Reservoir Computing for EEG-based Emotion Recognition Artificial Intelligence

In real-world applications such as emotion recognition from recorded brain activity, data are captured from electrodes over time. These signals constitute a multidimensional time series. In this paper, Echo State Network (ESN), a recurrent neural network with a great success in time series prediction and classification, is optimized with different neural plasticity rules for classification of emotions based on electroencephalogram (EEG) time series. Actually, the neural plasticity rules are a kind of unsupervised learning adapted for the reservoir, i.e. the hidden layer of ESN. More specifically, an investigation of Oja's rule, BCM rule and gaussian intrinsic plasticity rule was carried out in the context of EEG-based emotion recognition. The study, also, includes a comparison of the offline and online training of the ESN. When testing on the well-known affective benchmark "DEAP dataset" which contains EEG signals from 32 subjects, we find that pretraining ESN with gaussian intrinsic plasticity enhanced the classification accuracy and outperformed the results achieved with an ESN pretrained with synaptic plasticity. Four classification problems were conducted in which the system complexity is increased and the discrimination is more challenging, i.e. inter-subject emotion discrimination. Our proposed method achieves higher performance over the state of the art methods.

Improving speech emotion recognition via Transformer-based Predictive Coding through transfer learning Machine Learning

Speech emotion recognition is an important aspect of human-computer interaction. Prior works propose various transfer learning approaches to deal with limited samples in speech emotion recognition. However, they require labeled data for the source task, which cost much effort to collect them. To solve this problem, we focus on the unsupervised task, predictive coding. Nearly unlimited data for most domains can be utilized. In this paper, we utilize the multi-layer Transformer model for the predictive coding, followed with transfer learning approaches to share knowledge of the pre-trained predictive model for speech emotion recognition. We conduct experiments on IEMOCAP, and experimental results reveal the advantages of the proposed method. Our method reaches 65.03% in the weighted accuracy, which also outperforms some currently advanced approaches.

Meet These Incredible Women Advancing A.I. Research


A world renowned pioneer in social robotics, Cynthia Breazeal splits her time as an Associate Professor at MIT, where she received her PhD and founded the Personal Robots Group, and Founder and Chief Scientist of Jibo, a personal robotics company with over $85 million in funding. While Breazeal's work has won numerous academic awards, industry accolades, and media attention, she had to fight early skepticism in the 1990s from other experts in robotics and AI. At the time, robots were seen as physical and industrial tools, not social or emotional companions. Her first social robot, Kismet, was unfairly called out in popular press as "useless". Breazeal bucked the trend with a very different vision: "I wanted to create robots with social and emotional intelligence that could work in collaborative partnership with people. In 2-5 years, I see social robots helping families with things that really matter, like education, health, eldercare, entertainment, and companionship." She hopes her work and influence will inspire others to create robots "not only with smarts, but with heart, too."