Goto

Collaborating Authors

 Agent Societies


Partner Approximating Learners (PAL): Simulation-Accelerated Learning with Explicit Partner Modeling in Multi-Agent Domains

arXiv.org Artificial Intelligence

Mixed cooperative-competitive control scenarios such as human-machine interaction with individual goals of the interacting partners are very challenging for reinforcement learning agents. In order to contribute towards intuitive human-machine collaboration, we focus on problems in the continuous state and control domain where no explicit communication is considered and the agents do not know the others' goals or control laws but only sense their control inputs retrospectively. Our proposed framework combines a learned partner model based on online data with a reinforcement learning agent that is trained in a simulated environment including the partner model. Thus, we overcome drawbacks of independent learners and, in addition, benefit from a reduced amount of real world data required for reinforcement learning which is vital in the human-machine context. We finally analyze an example that demonstrates the merits of our proposed framework which learns fast due to the simulated environment and adapts to the continuously changing partner due to the partner approximation. Keywords: Reinforcement Learning, Mixed Cooperative-Competitive Control, Opponent Modeling.


Static force field representation of environments based on agents nonlinear motions

arXiv.org Machine Learning

RESEARCH Static Force Field Representation of Environments Based on Agents' Nonlinear Motions Damian Campo 1*, Alejandro Betancourt 1,2, Lucio Marcenaro 1 and Carlo Regazzoni 1 Abstract This paper presents a methodology that aims at the incremental representation of areas inside environments in terms of attractive forces. It is proposed a parametric representation of velocity fields ruling the dynamics of moving agents. It is assumed that attractive spots in the environment are responsible for modifying the motion of agents. A switching model is used to describe near and far velocity fields, which in turn are used to learn attractive characteristics of environments. The effect of such areas is considered radial over all the scene. Based on the estimation of attractive areas, a map that describes their effects in terms of their localizations, ranges of action and intensities is derived in an online way . Information of static attractive areas is added dynamically into a set of filters that describes possible interactions between moving agents and an environment. The proposed approach is first evaluated on synthetic data, posteriorly, the method is applied on real trajectories coming from moving pedestrians in an indoor environment. Keywords: Kalman filtering; Interactive force models; T rajectory analysis; Representation of environments; Situation awareness1 Introduction Analysis of trajectories performed by moving entities in environments is an important topic for different fields such as video surveillance [1], crowd/vehicle analysis [2, 3] and in general for monitoring systems, on which the dynamics of agents can lead to a better understanding of patterns and situations of interest [4, 5]. Abnormality detection is one of the most explored applications that involves analysis of trajectories. In such approach, by characterizing agents' motions, it is possible to learn and identify normal/abnormal situations in a certain environment. In general, approaches for abnormality detection are based on a set of observations that define the regular behaviors in a scene. Afterwards, abnormalities are defined as behaviors that do not match with patterns previously learned as normal, i.e., behaviors that have not been observed before [6].


Non-Bayesian Social Learning with Uncertain Models

arXiv.org Artificial Intelligence

Non-Bayesian social learning theory provides a framework that models distributed inference for a group of agents interacting over a social network. In this framework, each agent iteratively forms and communicates beliefs about an unknown state of the world with their neighbors using a learning rule. Existing approaches assume agents have access to precise statistical models (in the form of likelihoods) for the state of the world. However in many situations, such models must be learned from finite data. We propose a social learning rule that takes into account uncertainty in the statistical models using second-order probabilities. Therefore, beliefs derived from uncertain models are sensitive to the amount of past evidence collected for each hypothesis. We characterize how well the hypotheses can be tested on a social network, as consistent or not with the state of the world. We explicitly show the dependency of the generated beliefs with respect to the amount of prior evidence. Moreover, as the amount of prior evidence goes to infinity, learning occurs and is consistent with traditional social learning theory.


Signal Instructed Coordination in Team Competition

arXiv.org Artificial Intelligence

Most existing models of multi-agent reinforcement learning (MARL) adopt centralized training with decentralized execution framework. We demonstrate that the decentralized execution scheme restricts agents' capacity to find a better joint policy in team competition games, where each team of agents share the common rewards and cooperate to compete against other teams. To resolve this problem, we propose Signal Instructed Coordination (SIC), a novel coordination module that can be integrated with most existing models. SIC casts a common signal sampled from a pre-defined distribution to team members, and adopts an information-theoretic regularization to encourage agents to exploit in learning the instruction of centralized signals. Our experiments show that SIC can consistently improve team performance over well-recognized MARL models on matrix games and predator-prey games.


Multi-Objective Multi-Agent Decision Making: A Utility-based Analysis and Survey

arXiv.org Artificial Intelligence

The majority of multi-agent system (MAS) implementations aim to optimise agents' policies with respect to a single objective, despite the fact that many real-world problem domains are inherently multi-objective in nature. Multi-objective multi-agent systems (MOMAS) explicitly consider the possible trade-offs between conflicting objective functions. We argue that, in MOMAS, such compromises should be analysed on the basis of the utility that these compromises have for the users of a system. As is standard in multi-objective optimisation, we model the user utility using utility functions that map value or return vectors to scalar values. This approach naturally leads to two different optimisation criteria: expected scalarised returns (ESR) and scalarised expected returns (SER). We develop a new taxonomy which classifies multi-objective multi-agent decision making settings, on the basis of the reward structures, and which and how utility functions are applied. This allows us to offer a structured view of the field, to clearly delineate the current state-of-the-art in multi-objective multi-agent decision making approaches and to identify promising directions for future research. Starting from the execution phase, in which the selected policies are applied and the utility for the users is attained, we analyse which solution concepts apply to the different settings in our taxonomy. Furthermore, we define and discuss these solution concepts under both ESR and SER optimisation criteria. We conclude with a summary of our main findings and a discussion of many promising future research directions in multi-objective multi-agent systems.


A Reinforcement Learning Based Approach for Joint Multi-Agent Decision Making

arXiv.org Artificial Intelligence

Reinforcement Learning (RL) is being increasingly applied to optimize complex functions that may have a stochastic component. RL is extended to multi-agent systems to find policies to optimize systems that require agents to coordinate or to compete under the umbrella of Multi-Agent RL (MARL). A crucial factor in the success of RL is that the optimization problem is represented as the expected sum of rewards, which allows the use of backward induction for the solution. However, many real-world problems require a joint objective that is non-linear and dynamic programming cannot be applied directly. For example, in a resource allocation problem, one of the objective is to maximize long-term fairness among the users. This paper addresses and formalizes the problem of joint objective optimization, where not only the sum of rewards of each agent but a function of the sum of rewards of each agent needs to be optimized. The proposed algorithms at the centralized controller aims to learn the policy to dictate the actions for each agent such that the joint objective function based on average per step rewards of each agent is maximized. We propose both model-based and model-free algorithms, where the model-based algorithm is shown to achieve $\Tilde{O}(\sqrt{\frac{K}{T}})$ regret bound for $K$ agents over a time-horizon $T$, and the model-free algorithm can be implemented using deep neural networks. Further, using fairness in cellular base-station scheduling as an example, the proposed algorithms are shown to significantly outperform the state-of-the-art approaches.


From Few to More: Large-scale Dynamic Multiagent Curriculum Learning

arXiv.org Artificial Intelligence

A lot of efforts have been devoted to investigating how agents can learn effectively and achieve coordination in multiagent systems. However, it is still challenging in large-scale multiagent settings due to the complex dynamics between the environment and agents and the explosion of state-action space. In this paper, we design a novel Dynamic Multiagent Curriculum Learning (DyMA-CL) to solve large-scale problems by starting from learning on a multiagent scenario with a small size and progressively increasing the number of agents. We propose three transfer mechanisms across curricula to accelerate the learning process. Moreover, due to the fact that the state dimension varies across curricula,, and existing network structures cannot be applied in such a transfer setting since their network input sizes are fixed. Therefore, we design a novel network structure called Dynamic Agent-number Network (DyAN) to handle the dynamic size of the network input. Experimental results show that DyMA-CL using DyAN greatly improves the performance of large-scale multiagent learning compared with state-of-the-art deep reinforcement learning approaches. We also investigate the influence of three transfer mechanisms across curricula through extensive simulations.


Efficient Communication in Multi-Agent Reinforcement Learning via Variance Based Control

arXiv.org Machine Learning

Multi-agent reinforcement learning (MARL) has recently received considerable attention due to its applicability to a wide range of real-world applications. However, achieving efficient communication among agents has always been an overarching problem in MARL. In this work, we propose Variance Based Control (VBC), a simple yet efficient technique to improve communication efficiency in MARL. By limiting the variance of the exchanged messages between agents during the training phase, the noisy component in the messages can be eliminated effectively, while the useful part can be preserved and utilized by the agents for better performance. Our evaluation using a challenging set of StarCraft II benchmarks indicates that our method achieves $2-10\times$ lower in communication overhead than state-of-the-art MARL algorithms, while allowing agents to better collaborate by developing sophisticated strategies.


Modelling Bushfire Evacuation Behaviours

arXiv.org Artificial Intelligence

Bushfires pose a significant threat to Australia's regional areas. To minimise risk and increase resilience, communities need robust evacuation strategies that account for people's likely behaviour both before and during a bushfire. Agent-based modelling (ABM) offers a practical way to simulate a range of bushfire evacuation scenarios. However, the ABM should reflect the diversity of possible human responses in a given community. The Belief-Desire-Intention (BDI) cognitive model captures behaviour in a compact representation that is understandable by domain experts. Within a BDI-ABM simulation, individual BDI agents can be assigned profiles that determine their likely behaviour. Over a population of agents their collective behaviour will characterise the community response. These profiles are drawn from existing human behaviour research and consultation with emergency services personnel and capture the expected behaviours of identified groups in the population, both prior to and during an evacuation. A realistic representation of each community can then be formed, and evacuation scenarios within the simulation can be used to explore the possible impact of population structure on outcomes. It is hoped that this will give an improved understanding of the risks associated with evacuation, and lead to tailored evacuation plans for each community to help them prepare for and respond to bushfire.


Iterative Update and Unified Representation for Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

Multi-agent systems have a wide range of applications in cooperative and competitive tasks. As the number of agents increases, nonstationarity gets more serious in multi-agent reinforcement learning (MARL), which brings great difficulties to the learning process. Besides, current mainstream algorithms configure each agent an independent network,so that the memory usage increases linearly with the number of agents which greatly slows down the interaction with the environment. Inspired by Generative Adversarial Networks (GAN), this paper proposes an iterative update method (IU) to stabilize the nonstationary environment. Further, we add first-person perspective and represent all agents by only one network which can change agents' policies from sequential compute to batch compute. Similar to continual lifelong learning, we realize the iterative update method in this unified representative network (IUUR). In this method, iterative update can greatly alleviate the nonstationarity of the environment, unified representation can speed up the interaction with environment and avoid the linear growth of memory usage. Besides, this method does not bother decentralized execution and distributed deployment. Experiments show that compared with MADDPG, our algorithm achieves state-of-the-art performance and saves wall-clock time by a large margin especially with more agents.