Goto

Collaborating Authors

 Agents


Value Propagation for Decentralized Networked Deep Multi-agent Reinforcement Learning

arXiv.org Machine Learning

We consider the networked multi-agent reinforcement learning (MARL) problem in a fully decentralized setting, where agents learn to coordinate to achieve the joint success. This problem is widely encountered in many areas including traffic control, distributed control, and smart grids. We assume that the reward function for each agent can be different and observed only locally by the agent itself. Furthermore, each agent is located at a node of a communication network and can exchanges information only with its neighbors. Using softmax temporal consistency and a decentralized optimization method, we obtain a principled and data-efficient iterative algorithm. In the first step of each iteration, an agent computes its local policy and value gradients and then updates only policy parameters. In the second step, the agent propagates to its neighbors the messages based on its value function and then updates its own value function. Hence we name the algorithm value propagation. We prove a non-asymptotic convergence rate 1/T with the nonlinear function approximation. To the best of our knowledge, it is the first MARL algorithm with convergence guarantee in the control, off-policy and non-linear function approximation setting. We empirically demonstrate the effectiveness of our approach in experiments.


Identifying artificial intelligence 'blind spots'

#artificialintelligence

A novel model developed by MIT and Microsoft researchers identifies instances in which autonomous systems have "learned" from training examples that don't match what's actually happening in the real world. Engineers could use this model to improve the safety of artificial intelligence systems, such as driverless vehicles and autonomous robots. The AI systems powering driverless cars, for example, are trained extensively in virtual simulations to prepare the vehicle for nearly every event on the road. But sometimes the car makes an unexpected error in the real world because an event occurs that should, but doesn't, alter the car's behavior. Consider a driverless car that wasn't trained, and more importantly doesn't have the sensors necessary, to differentiate between distinctly different scenarios, such as large, white cars and ambulances with red, flashing lights on the road.


Google's AI, Chatbase Adds a New Service to Design More Versatile Virtual Agents Quicker

#artificialintelligence

Chatbase is a conversational AI platform that replaces the risky status quo approach with a data-driven one based on Google's world-class machine learning and search capabilities. The results include faster development (by up to 10x) of a more helpful and versatile virtual agent, and happier customers! Initially, Chatbase provided a free-to-use analytics service for measuring and optimizing any AI-powered chatbot. After analyzing hundreds of thousands of bots and billions of messages in the first 18 months of existence, Google had two revelations about how to help bot builders in a more impactful way: one, that customer service virtual agents would become the primary use case for the technology; and two, that using ML to glean insights from live-chat transcripts at scale would drastically shorten development time for those agents while creating a better consumer experience. With those lessons learned, Chatbase Virtual Agent Modeling (currently available via an EAP) was born.


AI bots team up to wrangle digital swine in Minecraft

#artificialintelligence

Wrangling a pig--even a virtual one--is much easier if you get a friend to help. This much seems clear from a contest organized by Microsoft researchers to test how artificially intelligent agents could cooperate to solve tricky problems. How best to cooperate with your pig-wrangling pal is another question. The competition addresses an area of artificial intelligence that has had relatively little attention so far. AI researchers often develop software capable of performing a specific human task, such as playing chess or Go, and then measure it according to its ability to defeat a human player.


Probabilistic Recursive Reasoning for Multi-Agent Reinforcement Learning

arXiv.org Machine Learning

Humans are capable of attributing latent mental contents such as beliefs, or intentions to others. The social skill is critical in everyday life to reason about the potential consequences of their behaviors so as to plan ahead. It is known that humans use this reasoning ability recursively, i.e. considering what others believe about their own beliefs. In this paper, we start from level-$1$ recursion and introduce a probabilistic recursive reasoning (PR2) framework for multi-agent reinforcement learning. Our hypothesis is that it is beneficial for each agent to account for how the opponents would react to its future behaviors. Under the PR2 framework, we adopt variational Bayes methods to approximate the opponents' conditional policy, to which each agent finds the best response and then improve their own policy. We develop decentralized-training-decentralized-execution algorithms, PR2-Q and PR2-Actor-Critic, that are proved to converge in the self-play scenario when there is one Nash equilibrium. Our methods are tested on both the matrix game and the differential game, which have a non-trivial equilibrium where common gradient-based methods fail to converge. Our experiments show that it is critical to reason about how the opponents believe about what the agent believes. We expect our work to contribute a new idea of modeling the opponents to the multi-agent reinforcement learning community.


Multi-Agent Generalized Recursive Reasoning

arXiv.org Artificial Intelligence

We propose a new reasoning protocol called generalized recursive reasoning (GR2), and embed it into the multi-agent reinforcement learning (MARL) framework. The GR2 model defines reasoning categories: level-$0$ agent acts randomly, and level-$k$ agent takes the best response to a mixed type of agents that are distributed over level $0$ to $k-1$. The GR2 leaners can take into account the bounded rationality, and it does not need the assumption that the opponent agents play Nash strategy in all stage games, which many MARL algorithms require. We prove that when the level $k$ is large, the GR2 learners will converge to at least one Nash Equilibrium (NE). In addition, if lower-level agents play the NE, high-level agents will surely follow as well. We evaluate the GR2 Soft Actor-Critic algorithms in a series of games and high-dimensional environment; results show that the GR2 methods have faster convergence speed than strong MARL baselines.


AI Dominates Human Professional Players in StarCraft II

#artificialintelligence

An artificial intelligence has defeated two top-ranked human players in the computer game StarCraft II, using some strategies rarely encountered before. On Thursday, gamers were able to watch the AI agent, called AlphaStar, expertly command armies of "Protoss" units against the professional players. The result: The AI beat the humans 10 out of the 11 matches. "I was surprised by how strong the agent was," said Dario "TLO" Wรผnsch, one of the human players. "AlphaStar takes well-known strategies and turns them on their head."


A simple blueprint for building AI-powered customer service on GCP Google Cloud Blog

#artificialintelligence

As a Google Cloud customer engineer based in Amsterdam, I work with a lot of banks and insurance companies in the Netherlands. All of them have this common requirement: to help customer service agents (many of whom are poorly trained interns due to the expense of hiring) handle large numbers of customer calls, especially at the end of the year when many consumers want to change or update their insurance plan. Most of these requests are predictable and easily resolved with the exchange of a small amount of information, which is a perfect use case for an AI-powered customer service agent. Virtual agents can provide non-queued service around the clock, and can easily be programmed to handle simple requests as well as do a hand-off to well-trained live agents for more complicated issues. Furthermore, a well-designed solution can help ensure that consumer requests, regardless of the channel in which they are received (phone, chat, IoT), are routed to the correct resource.


Identifying artificial intelligence 'blind spots'

#artificialintelligence

A novel model developed by MIT and Microsoft researchers identifies instances in which autonomous systems have "learned" from training examples that don't match what's actually happening in the real world. Engineers could use this model to improve the safety of artificial intelligence systems, such as driverless vehicles and autonomous robots. The AI systems powering driverless cars, for example, are trained extensively in virtual simulations to prepare the vehicle for nearly every event on the road. But sometimes the car makes an unexpected error in the real world because an event occurs that should, but doesn't, alter the car's behavior. Consider a driverless car that wasn't trained, and more importantly doesn't have the sensors necessary, to differentiate between distinctly different scenarios, such as large, white cars and ambulances with red, flashing lights on the road.


Self-driving cars, robots: Identifying AI 'blind spots'

#artificialintelligence

The AI systems powering driverless cars, for example, are trained extensively in virtual simulations to prepare the vehicle for nearly every event on the road. But sometimes the car makes an unexpected error in the real world because an event occurs that should, but doesn't, alter the car's behavior. Consider a driverless car that wasn't trained, and more importantly doesn't have the sensors necessary, to differentiate between distinctly different scenarios, such as large, white cars and ambulances with red, flashing lights on the road. If the car is cruising down the highway and an ambulance flicks on its sirens, the car may not know to slow down and pull over, because it does not perceive the ambulance as different from a big white car. In a pair of papers -- presented at last year's Autonomous Agents and Multiagent Systems conference and the upcoming Association for the Advancement of Artificial Intelligence conference -- the researchers describe a model that uses human input to uncover these training "blind spots."