Paiva, Ana
"Guess what I'm doing": Extending legibility to sequential decision tasks
Faria, Miguel, Melo, Francisco S., Paiva, Ana
Interaction between humans and agents/robots can greatly benefit from each other's ability to reason about the others' intentions--inferring what the other is trying to do and what its objectives are. In the human-robot interaction (HRI) literature, several works have explored the communication of intentions using speech [1, 2], gaze [3, 4], and movements [5, 6]. In this work we address the problem of conveying intention through action, which is closely related to the aforementioned works that explore communication of intention through movement. In particular, we are interested in the notion of legibility, introduced by Dragan et al. [7], that measures to what extent a user is able to infer the goal of a robot by observing a snippet of the robot's movement. A legible movement is characterized not by its efficiency in reaching the goal, but by its distinctiveness, i.e., by how much it is able to disambiguate the actual goal of the movement from other potential goals. In the original work of Dragan et al. [7], legibility is expressed by the probability of the goal given the movement, i.e., L(movement) = P (Goal | Movement snippet). Legibility has been widely explored in human-robot interaction to improve a robots' expressiveness through movement [5]. More recently, several works have extended the notion of legibility to domains other than robotic motion. The focus on improving the transparency and explainability of machine systems has been one of the main drives for the application of legibility beyond robotic motion [8].
Building Persuasive Robots with Social Power Strategies
Hashemian, Mojgan, Couto, Marta, Mascarenhas, Samuel, Paiva, Ana, Santos, Pedro A., Prada, Rui
Can social power endow social robots with the capacity to persuade? This paper represents our recent endeavor to design persuasive social robots. We have designed and run three different user studies to investigate the effectiveness of different bases of social power (inspired by French and Raven's theory) on peoples' compliance to the requests of social robots. The results show that robotic persuaders that exert social power (specifically from expert, reward, and coercion bases) demonstrate increased ability to influence humans. The first study provides a positive answer and shows that under the same circumstances, people with different personalities prefer robots using a specific social power base. In addition, social rewards can be useful in persuading individuals. The second study suggests that by employing social power, social robots are capable of persuading people objectively to select a less desirable choice among others. Finally, the third study shows that the effect of power on persuasion does not decay over time and might strengthen under specific circumstances. Moreover, exerting stronger social power does not necessarily lead to higher persuasion. Overall, we argue that the results of these studies are relevant for designing human--robot-interaction scenarios especially the ones aiming at behavioral change.
Centralized Training with Hybrid Execution in Multi-Agent Reinforcement Learning
Santos, Pedro P., Carvalho, Diogo S., Vasco, Miguel, Sardinha, Alberto, Santos, Pedro A., Paiva, Ana, Melo, Francisco S.
We introduce hybrid execution in multi-agent reinforcement learning (MARL), a new paradigm in which agents aim to successfully complete cooperative tasks with arbitrary communication levels at execution time by taking advantage of information-sharing among the agents. Under hybrid execution, the communication level can range from a setting in which no communication is allowed between agents (fully decentralized), to a setting featuring full communication (fully centralized), but the agents do not know beforehand which communication level they will encounter at execution time. To formalize our setting, we define a new class of multi-agent partially observable Markov decision processes (POMDPs) that we name hybrid-POMDPs, which explicitly model a communication process between the agents. We contribute MARO, an approach that makes use of an auto-regressive predictive model, trained in a centralized manner, to estimate missing agents' observations at execution time. We evaluate MARO on standard scenarios and extensions of previous benchmarks tailored to emphasize the negative impact of partial observability in MARL. Experimental results show that our method consistently outperforms relevant baselines, allowing agents to act with faulty communication while successfully exploiting shared information.
GMC -- Geometric Multimodal Contrastive Representation Learning
Poklukar, Petra, Vasco, Miguel, Yin, Hang, Melo, Francisco S., Paiva, Ana, Kragic, Danica
Learning representations of multimodal data that are both informative and robust to missing modalities at test time remains a challenging problem due to the inherent heterogeneity of data obtained from different channels. To address it, we present a novel Geometric Multimodal Contrastive (GMC) representation learning method comprised of two main components: i) a two-level architecture consisting of modality-specific base encoder, allowing to process an arbitrary number of modalities to an intermediate representation of fixed dimensionality, and a shared projection head, mapping the intermediate representations to a latent representation space; ii) a multimodal contrastive loss function that encourages the geometric alignment of the learned representations. We experimentally demonstrate that GMC representations are semantically rich and achieve state-of-the-art performance with missing modality information on three different learning problems including prediction and reinforcement learning tasks.
How to Sense the World: Leveraging Hierarchy in Multimodal Perception for Robust Reinforcement Learning Agents
Vasco, Miguel, Yin, Hang, Melo, Francisco S., Paiva, Ana
This work addresses the problem of sensing the world: how to learn a multimodal representation of a reinforcement learning agent's environment that allows the execution of tasks under incomplete perceptual conditions. To address such problem, we argue for hierarchy in the design of representation models and contribute with a novel multimodal representation model, MUSE. The proposed model learns hierarchical representations: low-level modality-specific representations, encoded from raw observation data, and a high-level multimodal representation, encoding joint-modality information to allow robust state estimation. We employ MUSE as the sensory representation model of deep reinforcement learning agents provided with multimodal observations in Atari games. We perform a comparative study over different designs of reinforcement learning agents, showing that MUSE allows agents to perform tasks under incomplete perceptual experience with minimal performance loss. Finally, we evaluate the performance of MUSE in literature-standard multimodal scenarios with higher number and more complex modalities, showing that it outperforms state-of-the-art multimodal variational autoencoders in single and cross-modality generation.
MHVAE: a Human-Inspired Deep Hierarchical Generative Model for Multimodal Representation Learning
Vasco, Miguel, Melo, Francisco S., Paiva, Ana
Humans are able to create rich representations of their external reality. Their internal representations allow for cross-modality inference, where available perceptions can induce the perceptual experience of missing input modalities. In this paper, we contribute the Multimodal Hierarchical Variational Auto-encoder (MHVAE), a hierarchical multimodal generative model for representation learning. Inspired by human cognitive models, the MHVAE is able to learn modality-specific distributions, of an arbitrary number of modalities, and a joint-modality distribution, responsible for cross-modality inference. We formally derive the model's evidence lower bound and propose a novel methodology to approximate the joint-modality posterior based on modality-specific representation dropout. We evaluate the MHVAE on standard multimodal datasets. Our model performs on par with other state-of-the-art generative models regarding joint-modality reconstruction from arbitrary input modalities and cross-modality inference.
Software architecture for YOLO, a creativity-stimulating robot
Alves-Oliveira, Patrícia, Gomes, Samuel, Chandak, Ankita, Arriaga, Patrícia, Hoffman, Guy, Paiva, Ana
YOLO is a social robot designed and developed to stimulate creativity in children through storytelling activities. Children use it as a character in their stories. This article details the artificial intelligence software developed for YOLO. The implemented software schedules through several Creativity Behaviors to find the ones that stimulate creativity more effectively. YOLO can choose between convergent and divergent thinking techniques, two important processes of creative thought. These techniques were developed based on the psychological theories of creativity development and on research from creativity experts who work with children. Additionally, this software allows the creation of Social Behaviors that enable the robot to behave as a believable character. On top of our framework, we built 3 main social behavior parameters: Exuberant, Aloof, and Harmonious. These behaviors are meant to ease immersive play and the process of character creation. The 3 social behaviors were based on psychological theories of personality and developed using children's input during co-design studies. Overall, this work presents an attempt to design, develop, and deploy social robots that nurture intrinsic human abilities, such as the ability to be creative.
Learning multimodal representations for sample-efficient recognition of human actions
Vasco, Miguel, Melo, Francisco S., de Matos, David Martins, Paiva, Ana, Inamura, Tetsunari
Humans interact in rich and diverse ways with the environment. However, the representation of such behavior by artificial agents is often limited. In this work we present \textit{motion concepts}, a novel multimodal representation of human actions in a household environment. A motion concept encompasses a probabilistic description of the kinematics of the action along with its contextual background, namely the location and the objects held during the performance. Furthermore, we present Online Motion Concept Learning (OMCL), a new algorithm which learns novel motion concepts from action demonstrations and recognizes previously learned motion concepts. The algorithm is evaluated on a virtual-reality household environment with the presence of a human avatar. OMCL outperforms standard motion recognition algorithms on an one-shot recognition task, attesting to its potential for sample-efficient recognition of human actions.
Engineering Pro-Sociality With Autonomous Agents
Paiva, Ana (IST, INESC-ID, University of Lisbon) | Santos, Fernando P. (IST, INESC-ID, University of Lisbon) | Santos, Francisco C. (IST, INESC-ID, University of Lisbon)
This paper envisions a future where autonomous agents are used to foster and support pro-social behavior in a hybrid society of humans and machines. Pro-social behavior occurs when people and agents perform costly actions that benefit others. Acts such as helping others voluntarily, donating to charity, providing informations or sharing resources, are all forms of pro-social behavior. We discuss two questions that challenge a purely utilitarian view of human decision making and contextualize its role in hybrid societies: i) What are the conditions and mechanisms that lead societies of agents and humans to be more pro-social? ii) How can we engineer autonomous entities (agents and robots) that lead to more altruistic and cooperative behaviors in a hybrid society? We propose using social simulations, game theory, population dynamics, and studies with people in virtual or real environments (with robots) where both agents and humans interact. This research will constitute the basis for establishing the foundations for the new field of Pro-social Computing, aiming at understanding, predicting and promoting pro-sociality among humans, through artificial agents and multiagent systems.
A Social Robot as a Card Game Player
Correia, Filipa (INESC-ID and Universidade de Lisboa) | Alves-Oliveira, Patrícia (Instituto Universitário de Lisboa and INESC-ID) | Ribeiro, Tiago (INESC-ID and Universidade de Lisboa) | Melo, Francisco S. (INESC-ID and Universidade de Lisboa) | Paiva, Ana (INESC-ID and Universidade de Lisboa)
This paper describes a social robotic game player that is able to successfully play a team card game called Sueca. The question we will address in this paper is: how can we build a social robot player that is able to balance its ability to play the card game with natural and social behaviours towards its partner and its opponents. The first challenge we faced concerned the development of a competent artificial player for a hidden information game, whose time constraint is the average human decision time. To accomplish this requirement, the Perfect Information Monte Carlo (PIMC) algorithm was used. Further, we have performed an analysis of this algorithm's possible parametrizations for games trees that cannot be fully explored in a reasonable amount of time with a MinMax search. Additionally, given the nature of the Sueca game, such robotic player must master the social interactions both as a partner and as an opponent. To do that, an emotional agent framework (FAtiMA) was used to build the emotional and social behaviours of the robot. At each moment, the robot not only plays competitively but also appraises the situation and responds emotionally in a natural manner. To test the approach, we conducted a user study and compared the levels of trust participants attributed to the robots and to human partners. Results have shown that the robot team exhibited a winning rate of 60%. Concerning the social aspects, the results also showed that human players increased their trust in the robot as their game partners (similar to the way to the trust levels change towards human partners).