Goto

Collaborating Authors

 Kopp, Stefan


Towards a GENEA Leaderboard -- an Extended, Living Benchmark for Evaluating and Advancing Conversational Motion Synthesis

arXiv.org Artificial Intelligence

Current evaluation practices in speech-driven gesture generation lack standardisation and focus on aspects that are easy to measure over aspects that actually matter. This leads to a situation where it is impossible to know what is the state of the art, or to know which method works better for which purpose when comparing two publications. In this position paper, we review and give details on issues with existing gesture-generation evaluation, and present a novel proposal for remedying them. Specifically, we announce an upcoming living leaderboard to benchmark progress in conversational motion synthesis. Unlike earlier gesture-generation challenges, the leaderboard will be updated with large-scale user studies of new gesture-generation systems multiple times per year, and systems on the leaderboard can be submitted to any publication venue that their authors prefer. By evolving the leaderboard evaluation data and tasks over time, the effort can keep driving progress towards the most important end goals identified by the community. We actively seek community involvement across the entire evaluation pipeline: from data and tasks for the evaluation, via tooling, to the systems evaluated. In other words, our proposal will not only make it easier for researchers to perform good evaluations, but their collective input and contributions will also help drive the future of gesture-generation research.


AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis

arXiv.org Artificial Intelligence

The generation of realistic and contextually relevant co-speech gestures is a challenging yet increasingly important task in the creation of multimodal artificial agents. Prior methods focused on learning a direct correspondence between co-speech gesture representations and produced motions, which created seemingly natural but often unconvincing gestures during human assessment. We present an approach to pre-train partial gesture sequences using a generative adversarial network with a quantization pipeline. The resulting codebook vectors serve as both input and output in our framework, forming the basis for the generation and reconstruction of gestures. By learning the mapping of a latent space representation as opposed to directly mapping it to a vector representation, this framework facilitates the generation of highly realistic and expressive gestures that closely replicate human movement and behavior, while simultaneously avoiding artifacts in the generation process. We evaluate our approach by comparing it with established methods for generating co-speech gestures as well as with existing datasets of human behavior. We also perform an ablation study to assess our findings. The results show that our approach outperforms the current state of the art by a clear margin and is partially indistinguishable from human gesturing. We make our data pipeline and the generation framework publicly available.


Towards autonomous artificial agents with an active self: modeling sense of control in situated action

arXiv.org Artificial Intelligence

In this paper we present a computational modeling account of an active self in artificial agents. In particular we focus on how an agent can be equipped with a sense of control and how it arises in autonomous situated action and, in turn, influences action control. We argue that this requires laying out an embodied cognitive model that combines bottom-up processes (sensorimotor learning and fine-grained adaptation of control) with top-down processes (cognitive processes for strategy selection and decision-making). We present such a conceptual computational architecture based on principles of predictive processing and free energy minimization. Using this general model, we describe how a sense of control can form across the levels of a control hierarchy and how this can support action control in an unpredictable environment. We present an implementation of this model as well as first evaluations in a simulated task scenario, in which an autonomous agent has to cope with un-/predictable situations and experiences corresponding sense of control. We explore different model parameter settings that lead to different ways of combining low-level and high-level action control. The results show the importance of appropriately weighting information in situations where the need for low/high-level action control varies and they demonstrate how the sense of control can facilitate this.


Resonating Minds -- Emergent Collaboration Through Hierarchical Active Inference

arXiv.org Artificial Intelligence

Working together on complex collaborative tasks requires agents to coordinate their actions. Doing this explicitly or completely prior to the actual interaction is not always possible nor sufficient. Agents also need to continuously understand the current actions of others and quickly adapt their own behavior appropriately. Here we investigate how efficient, automatic coordination processes at the level of mental states (intentions, goals), which we call belief resonance, can lead to collaborative situated problem-solving. We present a model of hierarchical active inference for collaborative agents (HAICA). It combines efficient Bayesian Theory of Mind processes with a perception-action system based on predictive processing and active inference. Belief resonance is realized by letting the inferred mental states of one agent influence another agent's predictive beliefs about its own goals and intentions. This way, the inferred mental states influence the agent's own task behavior without explicit collaborative reasoning. We implement and evaluate this model in the Overcooked domain, in which two agents with varying degrees of belief resonance team up to fulfill meal orders. Our results demonstrate that agents based on HAICA achieve a team performance comparable to recent state of the art approaches, while incurring much lower computational costs. We also show that belief resonance is especially beneficial in settings were the agents have asymmetric knowledge about the environment. The results indicate that belief resonance and active inference allow for quick and efficient agent coordination, and thus can serve as a building block for collaborative cognitive agents.


Satisficing Mentalizing: Bayesian Models of Theory of Mind Reasoning in Scenarios with Different Uncertainties

arXiv.org Artificial Intelligence

The ability to interpret the mental state of another agent based on its behavior, also called Theory of Mind (ToM), is crucial for humans in any kind of social interaction. Artificial systems, such as intelligent assistants, would also greatly benefit from such mentalizing capabilities. However, humans and systems alike are bound by limitations in their available computational resources. This raises the need for satisficing mentalizing, reconciling accuracy and efficiency in mental state inference that is good enough for a given situation. In this paper, we present different Bayesian models of ToM reasoning and evaluate them based on actual human behavior data that were generated under different kinds of uncertainties. We propose a Switching approach that combines specialized models, embodying simplifying presumptions, in order to achieve a more statisficing mentalizing compared to a Full Bayesian ToM model.


A predictive processing model of perception and action for self-other distinction

arXiv.org Artificial Intelligence

During interaction with others, we perceive and produce social actions in close temporal distance or even simultaneously. It has been argued that the motor system is involved in perception and action, playing a fundamental role in the handling of actions produced by oneself and by others. But how does it distinguish in this processing between self and other, thus contributing to self-other distinction? In this paper we propose a hierarchical model of sensorimotor coordination based on principles of perception-action coupling and predictive processing in which self-other distinction arises during action and perception. For this we draw on mechanisms assumed for the integration of cues for a sense of agency, i.e., the sense that an action is self-generated. We report results from simulations of different scenarios, showing that the model is not only able to minimize free energy during perception and action, but also showing that the model can correctly attribute sense of agency to own actions.


A Hybrid Grammar-Based Approach for Learning and Recognizing Natural Hand Gestures

AAAI Conferences

In this paper, we present a hybrid grammar formalism designed to learn structured models of natural iconic gesture performances that allow for compressed representation and robust recognition. We analyze a dataset of iconic gestures and show how the proposed Feature-based Stochastic Context-Free Grammar (FSCFG) can generalize over both structural and feature-based variations among different gesture performances.


Report on Representations for Multimodal Generation Workshop

AI Magazine

The Representations for Multimodal Generation Workshop was held on April 23-25, 2005, at Reykjavik University, Reykjavik, Iceland. The overall goal of this workshop is to further the state of research on multimodal generation by enabling (and getting) people in the field to work together on building systems capable of real-time face-to-face dialog with people. This report summarizes the activities and progress of that meeting.