Goto

Collaborating Authors

 Agents


Multi-Agent Adversarial Inverse Reinforcement Learning

arXiv.org Machine Learning

Reinforcement learning agents are prone to undesired behaviors due to reward mis-specification. Finding a set of reward functions to properly guide agent behaviors is particularly challenging in multi-agent scenarios. Inverse reinforcement learning provides a framework to automatically acquire suitable reward functions from expert demonstrations. Its extension to multi-agent settings, however, is difficult due to the more complex notions of rational behaviors. In this paper, we propose MA-AIRL, a new framework for multi-agent inverse reinforcement learning, which is effective and scalable for Markov games with high-dimensional state-action space and unknown dynamics. We derive our algorithm based on a new solution concept and maximum pseudolikelihood estimation within an adversarial reward learning framework. In the experiments, we demonstrate that MA-AIRL can recover reward functions that are highly correlated with ground truth ones, and significantly outperforms prior methods in terms of policy imitation.


Let's Make It Personal, A Challenge in Personalizing Medical Inter-Human Communication

arXiv.org Artificial Intelligence

Current AI approaches have frequently been used to help personalize many aspects of medical experiences and tailor them to a specific individuals' needs. However, while such systems consider medically-relevant information, they ignore socially-relevant information about how this diagnosis should be communicated and discussed with the patient. The lack of this capability may lead to mis-communication, resulting in serious implications, such as patients opting out of the best treatment. Consider a case in which the same treatment is proposed to two different individuals. The manner in which this treatment is mediated to each should be different, depending on the individual patient's history, knowledge, and mental state. While it is clear that this communication should be conveyed via a human medical expert and not a software-based system, humans are not always capable of considering all of the relevant aspects and traversing all available information. We pose the challenge of creating Intelligent Agents (IAs) to assist medical service providers (MSPs) and consumers in establishing a more personalized human-to-human dialogue. Personalizing conversations will enable patients and MSPs to reach a solution that is best for their particular situation, such that a relation of trust can be built and commitment to the outcome of the interaction is assured. We propose a four-part conceptual framework for personalized social interactions, expand on which techniques are available within current AI research and discuss what has yet to be achieved.


Multi-agent Inverse Reinforcement Learning for Two-person Zero-sum Games

arXiv.org Artificial Intelligence

The focus of this paper is a Bayesian framework for solving a class of problems termed multi-agent inverse reinforcement learning (MIRL). Compared to the well-known inverse reinforcement learning (IRL) problem, MIRL is formalized in the context of stochastic games, which generalize Markov decision processes to game theoretic scenarios. We establish a theoretical foundation for competitive two-agent zero-sum MIRL problems and propose a Bayesian solution approach in which the generative model is based on an assumption that the two agents follow a minimax bi-policy. Numerical results are presented comparing the Bayesian MIRL method with two existing methods in the context of an abstract soccer game. Investigation centers on relationships between the extent of prior information and the quality of learned rewards. Results suggest that covariance structure is more important than mean value in reward priors.


An Information-theoretic On-line Learning Principle for Specialization in Hierarchical Decision-Making Systems

arXiv.org Machine Learning

Information-theoretic bounded rationality describes utility-optimizing decision-makers whose limited information-processing capabilities are formalized by information constraints. One of the consequences of bounded rationality is that resource-limited decision-makers can join together to solve decision-making problems that are beyond the capabilities of each individual. Here, we study an information-theoretic principle that drives division of labor and specialization when decision-makers with information constraints are joined together. We devise an on-line learning rule of this principle that learns a partitioning of the problem space such that it can be solved by specialized linear policies. We demonstrate the approach for decision-making problems whose complexity exceeds the capabilities of individual decision-makers, but can be solved by combining the decision-makers optimally. The strength of the model is that it is abstract and principled, yet has direct applications in classification, regression, reinforcement learning and adaptive control.


On Hard Exploration for Reinforcement Learning: a Case Study in Pommerman

arXiv.org Artificial Intelligence

How to best explore in domains with sparse, delayed, and deceptive rewards is an important open problem for reinforcement learning (RL). This paper considers one such domain, the recently-proposed multi-agent benchmark of Pommerman. This domain is very challenging for RL --- past work has shown that model-free RL algorithms fail to achieve significant learning without artificially reducing the environment's complexity. In this paper, we illuminate reasons behind this failure by providing a thorough analysis on the hardness of random exploration in Pommerman. While model-free random exploration is typically futile, we develop a model-based automatic reasoning module that can be used for safer exploration by pruning actions that will surely lead the agent to death. We empirically demonstrate that this module can significantly improve learning.


Action Semantics Network: Considering the Effects of Actions in Multiagent Systems

arXiv.org Artificial Intelligence

In multiagent systems (MASs), each agent makes individual decisions but all of them contribute globally to the system evolution. Learning in MASs is difficult since the selection of actions must take place in the presence of other co-learning agents. Moreover, the environmental stochasticity and uncertainties increase exponentially with the number of agents. A number of previous works borrow various multiagent coordination mechanisms into deep multiagent learning architecture to facilitate multiagent coordination. However, none of them explicitly consider action semantics between agents. In this paper, we propose a novel network architecture, named Action Semantics Network (ASN), that explicitly represents such action semantics between agents. ASN characterizes different actions' influence on other agents using neural networks based on the action semantics between agents. ASN can be easily combined with existing deep reinforcement learning (DRL) algorithms to boost their performance. Experimental results on StarCraft II and Neural MMO show ASN significantly improves the performance of state-of-the-art DRL approaches compared with a number of network architectures.


Google Research Football: A Novel Reinforcement Learning Environment

arXiv.org Machine Learning

Recent progress in the field of reinforcement learning has been accelerated by virtual learning environments such as video games, where novel algorithms and ideas can be quickly tested in a safe and reproducible manner. We introduce the Google Research F ootball Environment, a new reinforcement learning environment where agents are trained to play football in an advanced, physics-based 3D simulator. The resulting environment is challenging, easy to use and customize, and it is available under a permissive open-source license. In addition, it provides support for multiplayer and multi-agent experiments. We propose three full-game scenarios of varying difficulty with the F ootball Benchmarks and report baseline results for three commonly used reinforcement algorithms (IMP ALA, PPO, and Ape-X DQN). We also provide a diverse set of simpler scenarios with the F ootball Academy and showcase several promising research directions.


Distributed Artificial Intelligence

#artificialintelligence

Let's start from the broader classification. Distributed Artificial Intelligence (DAI) is a class of technologies and methods that span from swarm intelligence to multi-agent technologies and that basically concerns the development of distributed solutions for a specific problem. It can mainly be used for learning, reasoning, and planning, and it is one of the subsets of AI where simulation has a way greater importance than point-prediction. In this class of systems, autonomous learning processing agents (distributed at large scale and independent) reach conclusions or a semi-equilibrium through interaction and communication (even asynchronously). One of the big benefits of those with respect to neural networks is that they do not require the same amount of data to work -- far to say though these are simple systems.


Weighted Matching Markets with Budget Constraints

Journal of Artificial Intelligence Research

We investigate markets with a set of students on one side and a set of colleges on the other. A student and college can be linked by a weighted contract that defines the student's wage, while a college's budget for hiring students is limited. Stability is a crucial requirement for matching mechanisms to be applied in the real world. A standard stability requirement is coalitional stability, i.e., no pair of a college and group of students has any incentive to deviate. We find that a coalitionally stable matching is not guaranteed to exist, verifying the coalitional stability for a given matching is coNP-complete, and the problem of finding whether a coalitionally stable matching exists in a given market, is SigmaP2-complete: NPNP-complete. Other negative results also hold when blocking coalitions contain at most two students and one college. Given these computational hardness results, we pursue a weaker stability requirement called pairwise stability, where no pair of a college and single student has an incentive to deviate. Unfortunately, a pairwise stable matching is not guaranteed to exist either. Thus, we consider a restricted market called a typed weighted market, in which students are partitioned into types that induce their possible wages. We then design a strategy-proof and Pareto efficient mechanism that works in polynomial-time for computing a pairwise stable matching in typed weighted markets.


Fairness in Reinforcement Learning

arXiv.org Artificial Intelligence

Decision support systems (e.g., for ecological conservation) and autonomous systems (e.g., adaptive controllers in smart cities) start to be deployed in real applications. Although their operations often impact many users or stakeholders, no fairness consideration is generally taken into account in their design, which could lead to completely unfair outcomes for some users or stakeholders. To tackle this issue, we advocate for the use of social welfare functions that encode fairness and present this general novel problem in the context of (deep) reinforcement learning, although it could possibly be extended to other machine learning tasks.