Agents
5 Empathetic Design Principles for Successful Human-Agent Interaction
Nonverbal communication is a crucial element of this – in both regular and fallback experiences – as it helps the agent convey information in a way that feels more instinctive and familiar to us. Dogs do an amazing job of this – they nonverbally communicate with us in a way that's clear and easy to understand, and it's important for the agent to do the same. Nonverbal communication can be a far less intrusive interface than voice alone, and can attract the user's attention in a subtle, yet effective way. For example, when a user calls out ElliQ's wake word, ElliQ's face lights up and head bends forward, leaning in to indicate listening. This endearing behavior not only draws the user in, it also explicitly conveys to them that their request or attempted interaction was indeed successful, and that they should proceed accordingly.
Google builds AI agent that learns to generalize to new environments by ignoring distractions
In a study earlier this year accepted to the Genetic and Evolutionary Computation Conference (GECCO) 2020, Google researchers investigate the properties of AI software agents that employ self-attention bottlenecks. They claim that these agents not only demonstrate an aptitude for solving challenging vision-based tasks, but that they're better at tackling slight modifications of the tasks, due to their blindness to details that might confuse them. Inattentional blindness is the phenomenon that causes a person to miss things in plain sight; it's a consequence of selective attention, a mechanism that's believed to enable humans to condense information into a form compact enough for decision-making. Luminaries like Yann LeCun assert it can inspire the design of AI systems that better mimic the elegance and efficiency of biological organisms. The Google researchers' proposed agent -- AttentionAgent -- aims to devote most of its attention to task-relevant elements, ignoring distractions.
Policy Evaluation and Seeking for Multi-Agent Reinforcement Learning via Best Response
Yan, Rui, Duan, Xiaoming, Shi, Zongying, Zhong, Yisheng, Marden, Jason R., Bullo, Francesco
This paper introduces two metrics (cycle-based and memory-based metrics), grounded on a dynamical game-theoretic solution concept called sink equilibrium, for the evaluation, ranking, and computation of policies in multi-agent learning. We adopt strict best response dynamics (SBRD) to model selfish behaviors at a meta-level for multi-agent reinforcement learning. Our approach can deal with dynamical cyclical behaviors (unlike approaches based on Nash equilibria and Elo ratings), and is more compatible with single-agent reinforcement learning than alpha-rank which relies on weakly better responses. We first consider settings where the difference between largest and second largest underlying metric has a known lower bound. With this knowledge we propose a class of perturbed SBRD with the following property: only policies with maximum metric are observed with nonzero probability for a broad class of stochastic games with finite memory. We then consider settings where the lower bound for the difference is unknown. For this setting, we propose a class of perturbed SBRD such that the metrics of the policies observed with nonzero probability differ from the optimal by any given tolerance. The proposed perturbed SBRD addresses the opponent-induced non-stationarity by fixing the strategies of others for the learning agent, and uses empirical game-theoretic analysis to estimate payoffs for each strategy profile obtained due to the perturbation.
A Computational Lens on Economics
The COVID-19 pandemic is a dual crisis. On one hand, it is a global health crisis with millions of cases and hundreds of thousands of deaths. At the same time, decisions by individuals and governments in response to the pandemic have led to a severe economic slowdown, the likes of which has not seen since the Great Depression in the 20th century. But, as I wrote in a May 2020 column, economics can be argued to be one of the roots of this dual crisis. I quoted William Galston, who wrote: "What if the relentless pursuit of efficiency, which has dominated American business thinking for decades, has made the global economic system more vulnerable to shocks?"
Deep Implicit Coordination Graphs for Multi-agent Reinforcement Learning
Li, Sheng, Gupta, Jayesh K., Morales, Peter, Allen, Ross, Kochenderfer, Mykel J.
Multi-agent reinforcement learning (MARL) requires coordination to efficiently solve certain tasks. Fully centralized control is often infeasible in such domains due to the size of joint action spaces. Coordination graph based formalization allows reasoning about the joint action based on the structure of interactions. However, they often require domain expertise in their design. This paper introduces the deep implicit coordination graph (DICG) architecture for such scenarios. DICG consists of a module for inferring the dynamic coordination graph structure which is then used by a graph neural network based module to learn to implicitly reason about the joint actions or values. DICG allows learning the tradeoff between full centralization and decentralization via standard actor-critic methods to significantly improve coordination for domains with large number of agents. We apply DICG to both centralized-training-centralized-execution and centralized-training-decentralized-execution regimes. We demonstrate that DICG solves the relative overgeneralization pathology in predatory-prey tasks as well as outperforms various MARL baselines on the challenging StarCraft II Multi-agent Challenge (SMAC) and traffic junction environments.
Optimal Statistical Hypothesis Testing for Social Choice
We address the following question in this paper: "What are the most robust statistical methods for social choice?'' By leveraging the theory of uniformly least favorable distributions in the Neyman-Pearson framework to finite models and randomized tests, we characterize uniformly most powerful (UMP) tests, which is a well-accepted statistical optimality w.r.t. robustness, for testing whether a given alternative is the winner under Mallows' model and under Condorcet's model, respectively.
Modelling Agent Policies with Interpretable Imitation Learning
Bewley, Tom, Lawry, Jonathan, Richards, Arthur
As we deploy autonomous agents in safety-critical domains, it becomes important to develop an understanding of their internal mechanisms and representations. We outline an approach to imitation learning for reverse-engineering black box agent policies in MDP environments, yielding simplified, interpretable models in the form of decision trees. As part of this process, we explicitly model and learn agents' latent state representations by selecting from a large space of candidate features constructed from the Markov state.
Contextual and Possibilistic Reasoning for Coalition Formation
Bikakis, Antonis, Caire, Patrice
In multiagent systems, agents often have to rely on other agents to reach their goals, for example when they lack a needed resource or do not have the capability to perform a required action. Agents therefore need to cooperate. Then, some of the questions raised are: Which agent(s) to cooperate with? What are the potential coalitions in which agents can achieve their goals? As the number of possibilities is potentially quite large, how to automate the process? And then, how to select the most appropriate coalition, taking into account the uncertainty in the agents' abilities to carry out certain tasks? In this article, we address the question of how to find and evaluate coalitions among agents in multiagent systems using MCS tools, while taking into consideration the uncertainty around the agents' actions. Our methodology is the following: We first compute the solution space for the formation of coalitions using a contextual reasoning approach. Second, we model agents as contexts in Multi-Context Systems (MCS), and dependence relations among agents seeking to achieve their goals, as bridge rules. Third, we systematically compute all potential coalitions using algorithms for MCS equilibria, and given a set of functional and non-functional requirements, we propose ways to select the best solutions. Finally, in order to handle the uncertainty in the agents' actions, we extend our approach with features of possibilistic reasoning. We illustrate our approach with an example from robotics.
Generalization of Agent Behavior through Explicit Representation of Context
Tutum, Cem C, Abdulquddos, Suhaib, Miikkulainen, Risto
In order to deploy autonomous agents in digital interactive environments, they must be able to act robustly in unseen situations. The standard machine learning approach is to include as much variation as possible into training these agents. The agents can then interpolate within their training, but they cannot extrapolate much beyond it. This paper proposes a principled approach where a context module is coevolved with a skill module in the game. The context module recognizes the temporal variation in the game and modulates the outputs of the skill module so that the action decisions can be made robustly even in previously unseen situations. The approach is evaluated in the Flappy Bird and LunarLander video games, as well as in the CARLA autonomous driving simulation. The Context+Skill approach leads to significantly more robust behavior in environments that require extrapolation beyond training. Such a principled generalization ability is essential in deploying autonomous agents in real-world tasks, and can serve as a foundation for continual adaptation as well.
Particle Swarm Optimization with Velocity Restriction and Evolutionary Parameters Selection for Scheduling Problem
Matrenin, Pavel, Sekaev, Viktor
The article presents a study of the Particle Swarm optimization method for scheduling problem. To improve the method's performance a restriction of particles' velocity and an evolutionary meta-optimization were realized. The approach proposed uses the Genetic algorithms for selection of the parameters of Particle Swarm optimization. Experiments were carried out on test tasks of the job-shop scheduling problem. This research proves the applicability of the approach and shows the importance of tuning the behavioral parameters of the swarm intelligence methods to achieve a high performance.