Agents
Empirical Estimates on Hand Manipulation are Recoverable: A Step Towards Individualized and Explainable Robotic Support in Everyday Activities
Wich, Alexander, Schultheis, Holger, Beetz, Michael
A key challenge for robotic systems is to figure out the behavior of another agent. The capability to draw correct inferences is crucial to derive human behavior from examples. Processing correct inferences is especially challenging when (confounding) factors are not controlled experimentally (observational evidence). For this reason, robots that rely on inferences that are correlational risk a biased interpretation of the evidence. We propose equipping robots with the necessary tools to conduct observational studies on people. Specifically, we propose and explore the feasibility of structural causal models with non-parametric estimators to derive empirical estimates on hand behavior in the context of object manipulation in a virtual kitchen scenario. In particular, we focus on inferences under (the weaker) conditions of partial confounding (the model covering only some factors) and confront estimators with hundreds of samples instead of the typical order of thousands. Studying these conditions explores the boundaries of the approach and its viability. Despite the challenging conditions, the estimates inferred from the validation data are correct. Moreover, these estimates are stable against three refutation strategies where four estimators are in agreement. Furthermore, the causal quantity for two individuals reveals the sensibility of the approach to detect positive and negative effects. The validity, stability and explainability of the approach are encouraging and serve as the foundation for further research.
Exploiting Semantic Epsilon Greedy Exploration Strategy in Multi-Agent Reinforcement Learning
Multi-agent reinforcement learning (MARL) can model many real world applications. However, many MARL approaches rely on epsilon greedy for exploration, which may discourage visiting advantageous states in hard scenarios. In this paper, we propose a new approach QMIX(SEG) for tackling MARL. It makes use of the value function factorization method QMIX to train per-agent policies and a novel Semantic Epsilon Greedy (SEG) exploration strategy. SEG is a simple extension to the conventional epsilon greedy exploration strategy, yet it is experimentally shown to greatly improve the performance of MARL. We first cluster actions into groups of actions with similar effects and then use the groups in a bi-level epsilon greedy exploration hierarchy for action selection. We argue that SEG facilitates semantic exploration by exploring in the space of groups of actions, which have richer semantic meanings than atomic actions. Experiments show that QMIX(SEG) largely outperforms QMIX and leads to strong performance competitive with current state-of-the-art MARL approaches on the StarCraft Multi-Agent Challenge (SMAC) benchmark.
Constructing games on networks for controlling the inequalities in the capital distribution
The inequality in capital or resource distribution is among the important phenomena observed in populations. The sources of inequality and methods for controlling it are of practical interest. To study this phenomenon, we introduce a model of interaction between agents in the network designed for reducing the inequality in the distribution of capital. To achieve the effect of inequality reduction, we interpret the outcome of the elementary game played in the network such that the wining of the game is translated into the reduction of the inequality. We study different interpretations of the introduced scheme and their impact on the behaviour of agents in the terms of the capital distribution, and we provide examples based on the capital dependent Parrondo's paradox. The results presented in this study provide insight into the mechanics of the inequality formation in the society.
Probe-Based Interventions for Modifying Agent Behavior
Tucker, Mycal, Kuhl, William, Shahid, Khizer, Karten, Seth, Sycara, Katia, Shah, Julie
Neural nets are powerful function approximators, but the behavior of a given neural net, once trained, cannot be easily modified. We wish, however, for people to be able to influence neural agents' actions despite the agents never training with humans, which we formalize as a human-assisted decision-making problem. Inspired by prior art initially developed for model explainability, we develop a method for updating representations in pre-trained neural nets according to externally-specified properties. In experiments, we show how our method may be used to improve human-agent team performance for a variety of neural networks from image classifiers to agents in multi-agent reinforcement learning settings.
Learning for Collaboration, Not Competition
Jakob Foerster an accredited Machine Learning Research Scientist who has been at the forefront of research on Multi-Agent Learning speaks with interviewer Kegan Strawn. Dr. Foerster explains why incorporating uncertainty into multi-agent interactions is essential to creating robust algorithms that can operate not only in games but in real-world applications. Jakob Foerster Jakob Foerster is an Associate Professor at the University of Oxford. His papers have gained prestigious awards at top machine learning conferences (ICML, AAAI) and have helped push deep multi-agent reinforcement learning to the forefront of AI research. Jakob previously worked at Facebook AI Research and received his Ph.D. from the University of Oxford under the supervision of Shimon Whiteson.
Public Information Representation for Adversarial Team Games
Carminati, Luca, Cacciamani, Federico, Ciccone, Marco, Gatti, Nicola
The peculiarity of adversarial team games resides in the asymmetric information available to the team members during the play, which makes the equilibrium computation problem hard even with zero-sum payoffs. The algorithms available in the literature work with implicit representations of the strategy space and mainly resort to Linear Programming and column generation techniques to enlarge incrementally the strategy space. Such representations prevent the adoption of standard tools such as abstraction generation, game solving, and subgame solving, which demonstrated to be crucial when solving huge, real-world two-player zero-sum games. Differently from these works, we answer the question of whether there is any suitable game representation enabling the adoption of those tools. In particular, our algorithms convert a sequential team game with adversaries to a classical two-player zero-sum game. In this converted game, the team is transformed into a single coordinator player who only knows information common to the whole team and prescribes to the players an action for any possible private state. Interestingly, we show that our game is more expressive than the original extensive-form game as any state/action abstraction of the extensive-form game can be captured by our representation, while the reverse does not hold. Due to the NP-hard nature of the problem, the resulting Public Team game may be exponentially larger than the original one. To limit this explosion, we provide three algorithms, each returning an information-lossless abstraction that dramatically reduces the size of the tree. These abstractions can be produced without generating the original game tree. Finally, we show the effectiveness of the proposed approach by presenting experimental results on Kuhn and Leduc Poker games, obtained by applying state-of-art algorithms for two-player zero-sum games on the converted games
Online Active Learning with Dynamic Marginal Gain Thresholding
Werner, Mariel A., Angelopoulos, Anastasios, Bates, Stephen, Jordan, Michael I.
The blessing of ubiquitous data also comes with a curse: the communication, storage, and labeling of massive, mostly redundant datasets. In our work, we seek to solve the problem at its source, collecting only valuable data and throwing out the rest, via active learning. We propose an online algorithm which, given any stream of data, any assessment of its value, and any formulation of its selection cost, extracts the most valuable subset of the stream up to a constant factor while using minimal memory. Notably, our analysis also holds for the federated setting, in which multiple agents select online from individual data streams without coordination and with potentially very different appraisals of cost. One particularly important use case is selecting and labeling training sets from unlabeled collections of data that maximize the test-time performance of a given classifier. In prediction tasks on ImageNet and MNIST, we show that our selection method outperforms random selection by up to 5-20%.
Language Generation for Broad-Coverage, Explainable Cognitive Systems
This paper describes recent progress on natural language generation (NLG) for language-endowed intelligent agents (LEIAs) developed within the OntoAgent cognitive architecture. The approach draws heavily from past work on natural language understanding in this paradigm: it uses the same knowledge bases, theory of computational linguistics, agent architecture, and methodology of developing broad-coverage capabilities over time while still supporting near-term applications.
Aerospace Human System Integration Evolution over the Last 40 Years
This chapter focuses on the evolution of Human-Centered Design (HCD) in aerospace systems over the last forty years. Human Factors and Ergonomics first shifted from the study of physical and medical issues to cognitive issues circa the 1980s. The advent of computers brought with it the development of human-computer interaction (HCI), which then expanded into the field of digital interaction design and User Experience (UX). We ended up with the concept of interactive cockpits, not because pilots interacted with mechanical things, but because they interacted using pointing devices on computer displays. Since the early 2000s, complexity and organizational issues gained prominence to the point that complex systems design and management found itself center stage, with the spotlight on the role of the human element and organizational setups. Today, Human Systems Integration (HSI) is no longer only a single-agent problem, but a multi-agent research field. Systems are systems of systems, considered as representations of people and machines. They are made of statically and dynamically articulated structures and functions. When they are at work, they are living organisms that generate emerging functions and structures that need to be considered in evolution (i.e., in their constant redesign). This chapter will more specifically, focus on human factors such as human-centered systemic representations, life critical systems, organizational issues, complexity management, modeling and simulation, flexibility, tangibility and autonomy. The discussion will be based on several examples in civil aviation and air combat, as well as aerospace.
Dynamics-Aware Comparison of Learned Reward Functions
Wulfe, Blake, Balakrishna, Ashwin, Ellis, Logan, Mercat, Jean, McAllister, Rowan, Gaidon, Adrien
The ability to learn reward functions plays an important role in enabling the deployment of intelligent agents in the real world. However, comparing reward functions, for example as a means of evaluating reward learning methods, presents a challenge. Reward functions are typically compared by considering the behavior of optimized policies, but this approach conflates deficiencies in the reward function with those of the policy search algorithm used to optimize it. To address this challenge, Gleave et al. (2020) propose the Equivalent-Policy Invariant Comparison (EPIC) distance. EPIC avoids policy optimization, but in doing so requires computing reward values at transitions that may be impossible under the system dynamics. This is problematic for learned reward functions because it entails evaluating them outside of their training distribution, resulting in inaccurate reward values that we show can render EPIC ineffective at comparing rewards. To address this problem, we propose the Dynamics-Aware Reward Distance (DARD), a new reward pseudometric. DARD uses an approximate transition model of the environment to transform reward functions into a form that allows for comparisons that are invariant to reward shaping while only evaluating reward functions on transitions close to their training distribution. Experiments in simulated physical domains demonstrate that DARD enables reliable reward comparisons without policy optimization and is significantly more predictive than baseline methods of downstream policy performance when dealing with learned reward functions.