Overview
Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective
Hu, Guimin, Xin, Yi, Lyu, Weimin, Huang, Haojian, Sun, Chang, Zhu, Zhihong, Gui, Lin, Cai, Ruichu
Multimodal affective computing (MAC) has garnered increasing attention due to its broad applications in analyzing human behaviors and intentions, especially in text-dominated multimodal affective computing field. This survey presents the recent trends of multimodal affective computing from NLP perspective through four hot tasks: multimodal sentiment analysis, multimodal emotion recognition in conversation, multimodal aspect-based sentiment analysis and multimodal multi-label emotion recognition. The goal of this survey is to explore the current landscape of multimodal affective research, identify development trends, and highlight the similarities and differences across various tasks, offering a comprehensive report on the recent progress in multimodal affective computing from an NLP perspective. This survey covers the formalization of tasks, provides an overview of relevant works, describes benchmark datasets, and details the evaluation metrics for each task. Additionally, it briefly discusses research in multimodal affective computing involving facial expressions, acoustic signals, physiological signals, and emotion causes. Additionally, we discuss the technical approaches, challenges, and future directions in multimodal affective computing. To support further research, we released a repository that compiles related works in multimodal affective computing, providing detailed resources and references for the community.
Modeling Information Narrative Detection and Evolution on Telegram during the Russia-Ukraine War
Gerard, Patrick, Volkova, Svitlana, Penafiel, Louis, Lerman, Kristina, Weninger, Tim
Following the Russian Federation's full-scale invasion of Ukraine in February 2022, a multitude of information narratives emerged within both pro-Russian and pro-Ukrainian communities online. As the conflict progresses, so too do the information narratives, constantly adapting and influencing local and global community perceptions and attitudes. This dynamic nature of the evolving information environment (IE) underscores a critical need to fully discern how narratives evolve and affect online communities. Existing research, however, often fails to capture information narrative evolution, overlooking both the fluid nature of narratives and the internal mechanisms that drive their evolution. Recognizing this, we introduce a novel approach designed to both model narrative evolution and uncover the underlying mechanisms driving them. In this work we perform a comparative discourse analysis across communities on Telegram covering the initial three months following the invasion. First, we uncover substantial disparities in narratives and perceptions between pro-Russian and pro-Ukrainian communities. Then, we probe deeper into prevalent narratives of each group, identifying key themes and examining the underlying mechanisms fueling their evolution. Finally, we explore influences and factors that may shape the development and spread of narratives.
The Critical Role of Effective Communication in Human-Robot Collaborative Assembly
Ferrari, Davide, Secchi, Cristian
In the rapidly evolving landscape of Human-Robot Collaboration (HRC), effective communication between humans and robots is crucial for complex task execution. Traditional request-response systems often lack naturalness and may hinder efficiency. This study emphasizes the importance of adopting human-like communication interactions to enable fluent vocal communication between human operators and robots simulating a collaborative human-robot industrial assembly. We propose a novel approach that employs human-like interactions through natural dialogue, enabling human operators to engage in vocal conversations with robots. Through a comparative experiment, we demonstrate the efficacy of our approach in enhancing task performance and collaboration efficiency. The robot's ability to engage in meaningful vocal conversations enables it to seek clarification, provide status updates, and ask for assistance when required, leading to improved coordination and a smoother workflow. The results indicate that the adoption of human-like conversational interactions positively influences the human-robot collaborative dynamic. Human operators find it easier to convey complex instructions and preferences, resulting in a more productive and satisfying collaboration experience.
Cyber Deception: State of the art, Trends and Open challenges
López, Pedro Beltrán, Pérez, Manuel Gil, Nespoli, Pantaleone
The growing interest in cybersecurity has significantly increased articles designing and implementing various Cyber Deception (CYDEC) mechanisms. This trend reflects the urgent need for new strategies to address cyber threats effectively. Since its emergence, CYDEC has established itself as an innovative defense against attackers, thanks to its proactive and reactive capabilities, finding applications in numerous real-life scenarios. Despite the considerable work devoted to CYDEC, the literature still presents significant gaps. In particular, there has not been (i) a comprehensive analysis of the main components characterizing CYDEC, (ii) a generic classification covering all types of solutions, nor (iii) a survey of the current state of the literature in various contexts. This article aims to fill these gaps through a detailed review of the main features that comprise CYDEC, developing a comprehensive classification taxonomy. In addition, the different frameworks used to generate CYDEC are reviewed, presenting a more comprehensive one. Existing solutions in the literature using CYDEC, both without Artificial Intelligence (AI) and with AI, are studied and compared. Finally, the most salient trends of the current state of the art are discussed, offering a list of pending challenges for future research.
Deep Learning Techniques for Hand Vein Biometrics: A Comprehensive Review
Hemis, Mustapha, Kheddar, Hamza, Bourouis, Sami, Saleem, Nasir
Biometric authentication has garnered significant attention as a secure and efficient method of identity verification. Among the various modalities, hand vein biometrics, including finger vein, palm vein, and dorsal hand vein recognition, offer unique advantages due to their high accuracy, low susceptibility to forgery, and non-intrusiveness. The vein patterns within the hand are highly complex and distinct for each individual, making them an ideal biometric identifier. Additionally, hand vein recognition is contactless, enhancing user convenience and hygiene compared to other modalities such as fingerprint or iris recognition. Furthermore, the veins are internally located, rendering them less susceptible to damage or alteration, thus enhancing the security and reliability of the biometric system. The combination of these factors makes hand vein biometrics a highly effective and secure method for identity verification. This review paper delves into the latest advancements in deep learning techniques applied to finger vein, palm vein, and dorsal hand vein recognition. It encompasses all essential fundamentals of hand vein biometrics, summarizes publicly available datasets, and discusses state-of-the-art metrics used for evaluating the three modes. Moreover, it provides a comprehensive overview of suggested approaches for finger, palm, dorsal, and multimodal vein techniques, offering insights into the best performance achieved, data augmentation techniques, and effective transfer learning methods, along with associated pretrained deep learning models. Additionally, the review addresses research challenges faced and outlines future directions and perspectives, encouraging researchers to enhance existing methods and propose innovative techniques.
Towards Understanding Human Emotional Fluctuations with Sparse Check-In Data
Shah, Sagar Paresh, Wu, Ga, Kortschot, Sean W., Daviau, Samuel
Data sparsity is a key challenge limiting the power of AI tools across various domains. The problem is especially pronounced in domains that require active user input rather than measurements derived from automated sensors. It is a critical barrier to harnessing the full potential of AI in domains requiring active user engagement, such as self-reported mood check-ins, where capturing a continuous picture of emotional states is essential. In this context, sparse data can hinder efforts to capture the nuances of individual emotional experiences such as causes, triggers, and contributing factors. Existing methods for addressing data scarcity often rely on heuristics or large established datasets, favoring deep learning models that lack adaptability to new domains. This paper proposes a novel probabilistic framework that integrates user-centric feedback-based learning, allowing for personalized predictions despite limited data. Achieving 60% accuracy in predicting user states among 64 options (chance of 1/64), this framework effectively mitigates data sparsity. It is versatile across various applications, bridging the gap between theoretical AI research and practical deployment.
A Survey of Multimodal Composite Editing and Retrieval
Li, Suyan, Huang, Fuxiang, Zhang, Lei
In the real world, where information is abundant and diverse across different modalities, understanding and utilizing various data types to improve retrieval systems is a key focus of research. Multimodal composite retrieval integrates diverse modalities such as text, image and audio, etc. to provide more accurate, personalized, and contextually relevant results. To facilitate a deeper understanding of this promising direction, this survey explores multimodal composite editing and retrieval in depth, covering image-text composite editing, image-text composite retrieval, and other multimodal composite retrieval. In this survey, we systematically organize the application scenarios, methods, benchmarks, experiments, and future directions. Multimodal learning is a hot topic in large model era, and have also witnessed some surveys in multimodal learning and vision-language models with transformers published in the PAMI journal. To the best of our knowledge, this survey is the first comprehensive review of the literature on multimodal composite retrieval, which is a timely complement of multimodal fusion to existing reviews. To help readers' quickly track this field, we build the project page for this survey, which can be found at https://github.com/fuxianghuang1/Multimodal-Composite-Editing-and-Retrieval.
Advancements in Gesture Recognition Techniques and Machine Learning for Enhanced Human-Robot Interaction: A Comprehensive Review
Hussain, Sajjad, Saeed, Khizer, Baimagambetov, Almas, Rab, Shanay, Saad, Md
In recent years robots have become an important part of our day-to-day lives with various applications. Human-robot interaction creates a positive impact in the field of robotics to interact and communicate with the robots. Gesture recognition techniques combined with machine learning algorithms have shown remarkable progress in recent years, particularly in human-robot interaction (HRI). This paper comprehensively reviews the latest advancements in gesture recognition methods and their integration with machine learning approaches to enhance HRI. Furthermore, this paper represents the vision-based gesture recognition for safe and reliable human-robot-interaction with a depth-sensing system, analyses the role of machine learning algorithms such as deep learning, reinforcement learning, and transfer learning in improving the accuracy and robustness of gesture recognition systems for effective communication between humans and robots.
Social Mediation through Robots -- A Scoping Review on Improving Group Interactions through Directed Robot Action using an Extended Group Process Model
Weisswange, Thomas H., Javed, Hifza, Dietrich, Manuel, Jung, Malte F., Jamali, Nawid
Group processes refer to the dynamics that occur within a group and are critical for understanding how groups function. With robots being increasingly placed within small groups, improving these processes has emerged as an important application of social robotics. Social Mediation Robots elicit behavioral change within groups by deliberately influencing the processes of groups. While research in this field has demonstrated that robots can effectively affect interpersonal dynamics, there is a notable gap in integrating these insights to develop coherent understanding and theory. We present a scoping review of literature targeting changes in social interactions between multiple humans through intentional action from robotic agents. To guide our review, we adapt the classical Input-Process-Output (I-P-O) models that we call "Mediation I-P-O model". We evaluated 1633 publications, which yielded 89 distinct social mediation concepts. We construct 11 mediation approaches robots can use to shape processes in small groups and teams. This work strives to produce generalizable insights and evaluate the extent to which the potential of social mediation through robots has been realized thus far. We hope that the proposed framework encourages a holistic approach to the study of social mediation and provides a foundation to standardize future reporting in the domain.
A Primer on Variational Inference for Physics-Informed Deep Generative Modelling
Glyn-Davies, Alex, Vadeboncoeur, Arnaud, Akyildiz, O. Deniz, Kazlauskaite, Ieva, Girolami, Mark
Variational inference (VI) is a computationally efficient and scalable methodology for approximate Bayesian inference. It strikes a balance between accuracy of uncertainty quantification and practical tractability. It excels at generative modelling and inversion tasks due to its built-in Bayesian regularisation and flexibility, essential qualities for physics related problems. Deriving the central learning objective for VI must often be tailored to new learning tasks where the nature of the problems dictates the conditional dependence between variables of interest, such as arising in physics problems. In this paper, we provide an accessible and thorough technical introduction to VI for forward and inverse problems, guiding the reader through standard derivations of the VI framework and how it can best be realized through deep learning. We then review and unify recent literature exemplifying the creative flexibility allowed by VI. This paper is designed for a general scientific audience looking to solve physics-based problems with an emphasis on uncertainty quantification.