Goto

Collaborating Authors

 Agent Societies


PAC Guarantees for Concurrent Reinforcement Learning with Restricted Communication

arXiv.org Machine Learning

We develop model free PAC performance guarantees for multiple concurrent MDPs, extending recent works where a single learner interacts with multiple non-interacting agents in a noise free environment. Our framework allows noisy and resource limited communication between agents, and develops novel PAC guarantees in this extended setting. By allowing communication between the agents themselves, we suggest improved PAC-exploration algorithms that can overcome the communication noise and lead to improved sample complexity bounds. We provide a theoretically motivated algorithm that optimally combines information from the resource limited agents, thereby analyzing the interaction between noise and communication constraints that are ubiquitous in real-world systems. We present empirical results for a simple task that supports our theoretical formulations and improve upon naive information fusion methods.


A Coupled Operational Semantics for Goals and Commitments

Journal of Artificial Intelligence Research

Commitments capture how an agent relates to another agent, whereas goals describe states of the world that an agent is motivated to bring about. Commitments are elements of the social state of a set of agents whereas goals are elements of the private states of individual agents. It makes intuitive sense that goals and commitments are understood as being complementary to each other. More importantly, an agent's goals and commitments ought to be coherent, in the sense that an agent's goals would lead it to adopt or modify relevant commitments and an agent's commitments would lead it to adopt or modify relevant goals. However, despite the intuitive naturalness of the above connections, they have not been adequately studied in a formal framework. This article provides a combined operational semantics for goals and commitments by relating their respective life cycles as a basis for how these concepts (1) cohere for an individual agent and (2) engender cooperation among agents.


Distributed Coalition Formation with Heterogeneous Agents for Task Allocation

AAAI Conferences

In this paper, we study the problem of forming coalitions with heterogeneous agents for allocating them to tasks. Several agents work together to complete a given task. Due to the inherent complexity of real-world tasks and limited capabilities of a particular type of a physical agent such as a robot, it is imperative to form a team consisting of different types of robots to complete the tasks. Our work in this paper proposes a distributed bipartite graph partitioning approach along with a region growing strategy for coalition formation with heterogeneous agents such as humans and/or robots for instantaneous allocation to tasks (ST-MR-IA). We also extend this approach to apply in the scenarios where the tasks might have dependencies among each other (ST-MR-TD).We have implemented the proposed algorithms within theWebots simulator. The proposed strategy allocates near-optimal (up to 98%) agent coalitions to tasks. Results also show that our proposed approach can easily handle as many as 100 agents and 10 tasks while spending an almost negligible amount of time.


A Contextual-Based Framework for Opinion Formation

AAAI Conferences

During opinion formation, interacting agents can be assumed to be engaging in learning and decision-making processes to satisfy their individual goals. These goals are determined by the agentsโ€™ preferences โ€“ which are often unknown, complex and unpredictable. Most opinion formation frameworks however, assume static preferences and fail to model practical situations where human preferences change. We propose a new framework to simulate the process of opinion formation under uncertainty and dynamism. Agents who are unaware of their implicit con-textual preferences utilize inverse reinforcement learning to compute reward functions that determines their preferences. Reinforcement learning is subsequently used to optimize the agentsโ€™ behavior and satisfy their individual goals. The novelty of our approach lies in its ability to capture uncertainty and dynamism in the agentโ€™s preferences, which are assumed to be unknown initially. This framework is compared to a baseline method based on reinforcement learning, and results show its ability to per-form better under dynamic scenarios.


QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning

arXiv.org Machine Learning

We explore value-based solutions for multi-agent reinforcement learning (MARL) tasks in the centralized training with decentralized execution (CTDE) regime popularized recently. However, VDN and QMIX are representative examples that use the idea of factorization of the joint action-value function into individual ones for decentralized execution. VDN and QMIX address only a fraction of factorizable MARL tasks due to their structural constraint in factorization such as additivity and monotonicity. In this paper, we propose a new factorization method for MARL, QTRAN, which is free from such structural constraints and takes on a new approach to transforming the original joint action-value function into an easily factorizable one, with the same optimal actions. QTRAN guarantees more general factorization than VDN or QMIX, thus covering a much wider class of MARL tasks than does previous methods. Our experiments for the tasks of multi-domain Gaussian-squeeze and modified predator-prey demonstrate QTRAN's superior performance with especially larger margins in games whose payoffs penalize non-cooperative behavior more aggressively.


On the Detection of Mutual Influences and Their Consideration in Reinforcement Learning Processes

arXiv.org Artificial Intelligence

Self-adaptation has been proposed as a mechanism to counter complexity in control problems of technical systems. A major driver behind self-adaptation is the idea to transfer traditional design-time decisions to runtime and into the responsibility of systems themselves. In order to deal with unforeseen events and conditions, systems need creativity -- typically realized by means of machine learning capabilities. Such learning mechanisms are based on different sources of knowledge. Feedback from the environment used for reinforcement purposes is probably the most prominent one within the self-adapting and self-organizing (SASO) systems community. However, the impact of other (sub-)systems on the success of the individual system's learning performance has mostly been neglected in this context. In this article, we propose a novel methodology to identify effects of actions performed by other systems in a shared environment on the utility achievement of an autonomous system. Consider smart cameras (SC) as illustrating example: For goals such as 3D reconstruction of objects, the most promising configuration of one SC in terms of pan/tilt/zoom parameters depends largely on the configuration of other SCs in the vicinity. Since such mutual influences cannot be pre-defined for dynamic systems, they have to be learned at runtime. Furthermore, they have to be taken into consideration when self-improving the own configuration decisions based on a feedback loop concept, e.g., known from the SASO domain or the Autonomic and Organic Computing initiatives. We define a methodology to detect such influences at runtime, present an approach to consider this information in a reinforcement learning technique, and analyze the behavior in artificial as well as real-world SASO system settings.


The sharp, the flat and the shallow: Can weakly interacting agents learn to escape bad minima?

arXiv.org Machine Learning

An open problem in machine learning is whether flat minima generalize better and how to compute such minima efficiently. This is a very challenging problem. As a first step towards understanding this question we formalize it as an optimization problem with weakly interacting agents. We review appropriate background material from the theory of stochastic processes and provide insights that are relevant to practitioners. We propose an algorithmic framework for an extended stochastic gradient Langevin dynamics and illustrate its potential. The paper is written as a tutorial, and presents an alternative use of multi-agent learning. Our primary focus is on the design of algorithms for machine learning applications; however the underlying mathematical framework is suitable for the understanding of large scale systems of agent based models that are popular in the social sciences, economics and finance.


Learned human-agent decision-making, communication and joint action in a virtual reality environment

arXiv.org Artificial Intelligence

Humans make decisions and act alongside other humans to pursue both short-term and long-term goals. As a result of ongoing progress in areas such as computing science and automation, humans now also interact with nonhuman agents of varying complexity as part of their day-to-day activities; substantial work is being done to integrate increasingly intelligent machine agents into human work and play. With increases in the cognitive, sensory, and motor capacity of these agents, intelligent machinery for human assistance can now reasonably be considered to engage in joint action with humans--i.e., two or more agents adapting their behaviour and their understanding of each other so as to progress in shared objectives or goals. The mechanisms, conditions, and opportunities for skillful joint action in human-machine partnerships is of great interest to multiple communities. Despite this, human-machine joint action is as yet underexplored, especially in cases where a human and an intelligent machine interact in a persistent way during the course of real-time, daily-life experience (as opposed to specialized, episodic, or time-limited settings such as game play, teaching, or task-focused personal computing applications). In this work, we contribute a virtual reality environment wherein a human and an agent can adapt their predictions, their actions, and their communication so as to pursue a simple foraging task. In a case study with a single participant, we provide an example of human-agent coordination and decision-making involving prediction learning on the part of the human and the machine agent, and control learning on the part of the machine agent wherein audio communication signals are used to cue its human partner in service of acquiring shared reward. These comparisons suggest the utility of studying human-machine coordination in a virtual reality environment, and identify further research that will expand our understanding of persistent human-machine joint action.


Autonomous Air Traffic Controller: A Deep Multi-Agent Reinforcement Learning Approach

arXiv.org Machine Learning

Air traffic control is a real-time safety-critical decision making process in highly dynamic and stochastic environments. In today's aviation practice, a human air traffic controller monitors and directs many aircraft flying through its designated airspace sector. With the fast growing air traffic complexity in traditional (commercial airliners) and low-altitude (drones and eVTOL aircraft) airspace, an autonomous air traffic control system is needed to accommodate high density air traffic and ensure safe separation between aircraft. We propose a deep multi-agent reinforcement learning framework that is able to identify and resolve conflicts between aircraft in a high-density, stochastic, and dynamic en-route sector with multiple intersections and merging points. The proposed framework utilizes an actor-critic model, A2C that incorporates the loss function from Proximal Policy Optimization (PPO) to help stabilize the learning process. In addition we use a centralized learning, decentralized execution scheme where one neural network is learned and shared by all agents in the environment. We show that our framework is both scalable and efficient for large number of incoming aircraft to achieve extremely high traffic throughput with safety guarantee. We evaluate our model via extensive simulations in the BlueSky environment. Results show that our framework is able to resolve 99.97% and 100% of all conflicts both at intersections and merging points, respectively, in extreme high-density air traffic scenarios.


Argus: Smartphone-enabled Human Cooperation via Multi-Agent Reinforcement Learning for Disaster Situational Awareness

arXiv.org Artificial Intelligence

Argus exploits a Multi-Agent Reinforcement Learning (MARL) framework to create a 3D mapping of the disaster scene using agents present around the incident zone to facilitate the rescue operations. The agents can be both human bystanders at the disaster scene as well as drones or robots that can assist the humans. The agents are involved in capturing the images of the scene using their smartphones (or on-board cameras in case of drones) as directed by the MARL algorithm. These images are used to build real time a 3D map of the disaster scene. Via both simulations and real experiments, an evaluation of the framework in terms of effectiveness in tracking random dynamicity of the environment is presented.