"Multimodal sentiment analysis" is a group of methods that constitute the gold standard for an AI dialog system with sentiment detection. These methods can automatically analyze a person's psychological state from their speech, voice color, facial expression, and posture and are crucial for human-centered AI systems. The technique could potentially realize an emotionally intelligent AI with beyond-human capabilities, which understands the user's sentiment and generates a response accordingly. However, current emotion estimation methods focus only on observable information and do not account for the information contained in unobservable signals, such as physiological signals. Such signals are a potential gold mine of emotions that could improve the sentiment estimation performance tremendously.
Artificial intelligence (AI) machine learning has transformed speech and language recognition technology. A new study published in IEEE Transactions on Affective Computing by researchers affiliated with the Japan Advanced Institute of Science and Technology (JAIST) and Osaka University demonstrates human-like, sentiment-sensing AI machine learning using physiological data. Emotional intelligence, or emotional quotient (EQ), refers to a person's ability to understand and manage emotions in order to build relationships, solve conflicts, manage stress, and other activities. Applied artificial intelligence machine learning practitioners are striving to integrate more human-like traits, such as EQ, in areas such as conversational AI chatbots, virtual assistants, and more for customer service, sales, and other functions. According to Allied Market Research, the worldwide conversational AI market size is projected to reach $32.6 billion by 2030, with a compound annual growth rate of 20 percent during 2021-2030.
Skowron, Marcin (Austrian Research Institute for Artificial Intelligence) | Pirker, Hannes (Austrian Research Institute for Artificial Intelligence) | Rank, Stefan (Austrian Research Institute for Artificial Intelligence) | Paltoglou, Georgios (Wolverhampton University) | Ahn, Junghyun (Virtual Reality Lab, EPFL) | Gobron, Stephane (Virtual Reality Lab, EPFL)
The aim of this paper is threefold: (1) it explores methods for the detection of affective states in text, (2) it presents the usage of such affective cues in a conversational system and (3) it evaluates its effectiveness in a virtual reality setting. Valence and arousal values, used for generating facial expressions of users' avatars, are also incorporated into the dialog, helping to bridge the gap between textual and visual modalities. The system is evaluated in terms of its ability to: (i) generate a realistic dialog, (ii) create an enjoyable chatting experience, and (iii) establish an emotional connection with participants. Results show that user ratings for the conversational agent match those obtained in a Wizard of Oz setting.
"Multimodal sentiment analysis" is a group of methods making up the gold standard for AI dialog systems with sentiment detection, and they can automatically analyze a person's psychological state from their speech, facial expressions, voice color, and posture. They are fundamental to creating human-centered AI systems and could lead to the development of an emotionally intelligent AI with "beyond-human capabilities." These capabilities would help the AI understand the user's sentiment before forming an appropriate response.
We present ADVISER - an open-source, multi-domain dialog system toolkit that enables the development of multi-modal (incorporating speech, text and vision), socially-engaged (e.g. emotion recognition, engagement level prediction and backchanneling) conversational agents. The final Python-based implementation of our toolkit is flexible, easy to use, and easy to extend not only for technically experienced users, such as machine learning researchers, but also for less technically experienced users, such as linguists or cognitive scientists, thereby providing a flexible platform for collaborative research. Link to open-source code: https://github.com/DigitalPhonetics/adviser