Agents
Noah Schwartz, Co-Founder & CEO of Quorum – Interview Series
Noah is an AI systems architect. Prior to founding Quorum, Noah spent 12 years in academic research, first at the University of Southern California and most recently at Northwestern as the Assistant Chair of Neurobiology. His work focused on information processing in the brain and he has translated his research into products in augmented reality, brain-computer interfaces, computer vision, and embedded robotics control systems. Your interest in AI and robotics started as a little boy. How were you first introduced to these technologies?
Noah Schwartz, Co-Founder & CEO of Quorum – Interview Series
Noah is an AI systems architect. Prior to founding Quorum, Noah spent 12 years in academic research, first at the University of Southern California and most recently at Northwestern as the Assistant Chair of Neurobiology. His work focused on information processing in the brain and he has translated his research into products in augmented reality, brain-computer interfaces, computer vision, and embedded robotics control systems. Your interest in AI and robotics started as a little boy. How were you first introduced to these technologies?
Optimising Game Tactics for Football
Beal, Ryan, Chalkiadakis, Georgios, Norman, Timothy J., Ramchurn, Sarvapali D.
In this paper we present a novel approach to optimise tactical and strategic decision making in football (soccer). We model the game of football as a multi-stage game which is made up from a Bayesian game to model the pre-match decisions and a stochastic game to model the in-match state transitions and decisions. Using this formulation, we propose a method to predict the probability of game outcomes and the payoffs of team actions. Building upon this, we develop algorithms to optimise team formation and in-game tactics with different objectives. Empirical evaluation of our approach on real-world datasets from 760 matches shows that by using optimised tactics from our Bayesian and stochastic games, we can increase a team chances of winning by up to 16.1\% and 3.4\% respectively.
Graph Neural Networks for Decentralized Controllers
Gama, Fernando, Tolstaya, Ekaterina, Ribeiro, Alejandro
Dynamical systems comprised of autonomous agents arise in many relevant problems such as multi-agent robotics, smart grids, or smart cities. Controlling these systems is of paramount importance to guarantee a successful deployment. Optimal centralized controllers are readily available but face limitations in terms of scalability and practical implementation. Optimal decentralized controllers, on the other hand, are difficult to find. In this paper, we use graph neural networks (GNNs) to learn decentralized controllers from data. GNNs are well-suited for the task since they are naturally distributed architectures. Furthermore, they are equivariant and stable, leading to good scalability and transferability properties. The problem of flocking is explored to illustrate the power of GNNs in learning decentralized controllers.
Anticipatory Psychological Models for Quickest Change Detection: Human Sensor Interaction
We consider anticipatory psychological models for human decision makers and their effect on sequential decision making. From a decision theoretic point of view, such models are time inconsistent meaning that Bellman's principle of optimality does not hold. The aim of this paper is to study how such an anxiety-based anticipatory utility can affect sequential decision making, such as quickest change detection, in multi-agent systems. We show that the interaction between anticipation-driven agents and sequential decision maker results in unusual (nonconvex) structure of the optimal decision policy. The methodology yields a useful mathematical framework for sensor interaction involving a human decision maker (with behavioral economics constraints) and a sensor equipped with automated sequential detector.
Multi-Agent Reinforcement Learning for Problems with Combined Individual and Team Reward
Sheikh, Hassam Ullah, Bölöni, Ladislau
Many cooperative multi-agent problems require agents to learn individual tasks while contributing to the collective success of the group. This is a challenging task for current state-of-the-art multi-agent reinforcement algorithms that are designed to either maximize the global reward of the team or the individual local rewards. The problem is exacerbated when either of the rewards is sparse leading to unstable learning. To address this problem, we present Decomposed Multi-Agent Deep Deterministic Policy Gradient (DE-MADDPG): a novel cooperative multi-agent reinforcement learning framework that simultaneously learns to maximize the global and local rewards. We evaluate our solution on the challenging defensive escort team problem and show that our solution achieves a significantly better and more stable performance than the direct adaptation of the MADDPG algorithm.
Modeling Contrary-to-Duty with CP-nets
Calegari, Roberta, Loreggia, Andrea, Lorini, Emiliano, Rossi, Francesca, Sartor, Giovanni
Modelling deontic notions through preferences [12] has the advantage of linking deontic notions to the manifold research on preferences, in multiple disciplines, such as philosophy, mathematics, economics and politics. In recent years, preferences have also been addressed within AI [15,8,18] and applications can be found in multi-agent systems [19] and recommender systems [17]. We shall model deontic notions through ceteris-paribus preferences, namely, conditional preferences for a state of affairs over another state of affairs, all the rest being equal. In particular, we shall focus on the ceteris-paribus preference for a proposition over its complement. The idea of ceteris-paribus preferences was originally introduced by the philosopher and logician Georg von Wright [22].
Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning
Long, Qian, Zhou, Zihan, Gupta, Abhibav, Fang, Fei, Wu, Yi, Wang, Xiaolong
In multi-agent games, the complexity of the environment can grow exponentially as the number of agents increases, so it is particularly challenging to learn good policies when the agent population is large. In this paper, we introduce Evolutionary Population Curriculum (EPC), a curriculum learning paradigm that scales up Multi-Agent Reinforcement Learning (MARL) by progressively increasing the population of training agents in a stage-wise manner. Furthermore, EPC uses an evolutionary approach to fix an objective misalignment issue throughout the curriculum: agents successfully trained in an early stage with a small population are not necessarily the best candidates for adapting to later stages with scaled populations. Concretely, EPC maintains multiple sets of agents in each stage, performs mix-and-match and fine-tuning over these sets and promotes the sets of agents with the best adaptability to the next stage. We implement EPC on a popular MARL algorithm, MADDPG, and empirically show that our approach consistently outperforms baselines by a large margin as the number of agents grows exponentially. The project page is https://sites.google.com/view/epciclr2020.
Social navigation with human empowerment driven reinforcement learning
van der Heiden, Tessa, Weiss, Christian, Shankar, Naveen Nagaraja, van Hoof, Herke
The next generation of mobile robots needs to be socially-compliant to be accepted by humans. As simple as this task may seem, defining compliance formally is not trivial. Yet, classical reinforcement learning (RL) relies upon hard-coded reward signals. In this work, we go beyond this approach and provide the agent with intrinsic motivation using empowerment. Empowerment maximizes the influence of an agent on its near future and has been shown to be a good model for biological behaviors. It also has been used for artificial agents to learn complicated and generalized actions. Self-empowerment maximizes the influence of an agent on its future. On the contrary, our robot strives for the empowerment of people in its environment, so they are not disturbed by the robot when pursuing their goals. We show that our robot has a positive influence on humans, as it minimizes the travel time and distance of humans while moving efficiently to its own goal. The method can be used in any multi-agent system that requires a robot to solve a particular task involving humans interactions.
Global Big Data Conference
Artificial intelligence (AI) is getting a bad reputation. And while Forrester predicts that 1 out of 4 CX professionals will lose their jobs in 2020, it has little to do with employers implementing AI. The real reason is the growing business-criticality of CX. In fact, brands will spend $8 billion more on customer service agents in 2020 than 2019 due to heightened demand and competition for highly skilled agents. The truth is plain and simple: AI isn't replacing contact center agents -- it's helping them step up to be the best they can be and deliver more value than ever.