Agents
Bottom-Up Meta-Policy Search
Melo, Luckeciano C., Maximo, Marcos R. O. A., da Cunha, Adilson Marques
Despite of the recent progress in agents that learn through interaction, there are several challenges in terms of sample efficiency and generalization across unseen behaviors during training. To mitigate these problems, we propose and apply a first-order Meta-Learning algorithm called Bottom-Up Meta-Policy Search (BUMPS), which works with two-phase optimization procedure: firstly, in a meta-training phase, it distills few expert policies to create a meta-policy capable of generalizing knowledge to unseen tasks during training; secondly, it applies a fast adaptation strategy named Policy Filtering, which evaluates few policies sampled from the meta-policy distribution and selects which best solves the task. We conducted all experiments in the RoboCup 3D Soccer Simulation domain, in the context of kick motion learning. We show that, given our experimental setup, BUMPS works in scenarios where simple multi-task Reinforcement Learning does not. Finally, we performed experiments in a way to evaluate each component of the algorithm.
Towards a Theory of Systems Engineering Processes: A Principal-Agent Model of a One-Shot, Shallow Process
Safarkhani, Salar, Bilionis, Ilias, Panchal, Jitesh
Systems engineering processes coordinate the effort of different individuals to generate a product satisfying certain requirements. As the involved engineers are self-interested agents, the goals at different levels of the systems engineering hierarchy may deviate from the system-level goals which may cause budget and schedule overruns. Therefore, there is a need of a systems engineering theory that accounts for the human behavior in systems design. To this end, the objective of this paper is to develop and analyze a principal-agent model of a one-shot (single iteration), shallow (one level of hierarchy) systems engineering process. We assume that the systems engineer maximizes the expected utility of the system, while the subsystem engineers seek to maximize their expected utilities. Furthermore, the systems engineer is unable to monitor the effort of the subsystem engineer and may not have a complete information about their types or the complexity of the design task. However, the systems engineer can incentivize the subsystem engineers by proposing specific contracts. To obtain an optimal incentive, we pose and solve numerically a bi-level optimization problem. Through extensive simulations, we study the optimal incentives arising from different system-level value functions under various combinations of effort costs, problem-solving skills, and task complexities.
Collaborative Graph Walk for Semi-supervised Multi-Label Node Classification
Akujuobi, Uchenna, Yufei, Han, Zhang, Qiannan, Zhang, Xiangliang
Personal use of this material is permitted. Abstract --In this work, we study semi-supervised multi-label node classification problem in attributed graphs. Classic solutions to multi-label node classification follow two steps, first learn node embedding and then build a node classifier on the learned embedding. T o improve the discriminating power of the node embedding, we propose a novel collaborative graph walk, named Multi-Label-Graph-Walk, to finely tune node representations with the available label assignments in attributed graphs via reinforcement learning. The proposed method formulates the multi-label node classification task as simultaneous graph walks conducted by multiple label-specific agents. Furthermore, policies of the label-wise graph walks are learned in a cooperative way to capture first the predictive relation between node labels and structural attributes of graphs; and second, the correlation among the multiple label-specific classification tasks. A comprehensive experimental study demonstrates that the proposed method can achieve significantly better multi-label classification performance than the state-of-the-art approaches and conduct more efficient graph exploration. Index T erms --Multi-label node classification, Semi-supervised attributed graph embedding, Reinforcement learning I. I NTRODUCTION Graph-structured data are frequently witnessed in many real-world applications, such as social graphs and academic graphs. In the graph structure, nodes represent entities (e.g., users in social graphs and papers in citation graphs), whereas edges linking two nodes denote the relationship between the entities (e.g., user friendship and paper citation). Usually both nodes and edges possess their own attributes.
A New Framework for Multi-Agent Reinforcement Learning -- Centralized Training and Exploration with Decentralized Execution via Policy Distillation
Deep reinforcement learning (DRL) is a booming area of artificial intelligence. Many practical applications of DRL naturally involve more than one collaborative learners, making it important to study DRL in a multi-agent context. Previous research showed that effective learning in complex multi-agent systems demands for highly coordinated environment exploration among all the participating agents. Many researchers attempted to cope with this challenge through learning centralized value functions. However, the common strategy for every agent to learn their local policies directly often fail to nurture strong inter-agent collaboration and can be sample inefficient whenever agents alter their communication channels. To address these issues, we propose a new framework known as centralized training and exploration with decentralized execution via policy distillation. Guided by this framework and the maximum-entropy learning technique, we will first train agents' policies with shared global component to foster coordinated and effective learning. Locally executable policies will be derived subsequently from the trained global policies via policy distillation. Experiments show that our new framework and algorithm can achieve significantly better performance and higher sample efficiency than a cutting-edge baseline on several multi-agent DRL benchmarks.
Learning interaction kernels in heterogeneous systems of agents from multiple trajectories
Lu, Fei, Maggioni, Mauro, Tang, Sui
Systems of interacting particles or agents have wide applications in many disciplines such as Physics, Chemistry, Biology and Economics. These systems are governed by interaction laws, which are often unknown: estimating them from observation data is a fundamental task that can provide meaningful insights and accurate predictions of the behaviour of the agents. In this paper, we consider the inverse problem of learning interaction laws given data from multiple trajectories, in a nonparametric fashion, when the interaction kernels depend on pairwise distances. We establish a condition for learnability of interaction kernels, and construct estimators that are guaranteed to converge in a suitable $L^2$ space, at the optimal min-max rate for 1-dimensional nonparametric regression. We propose an efficient learning algorithm based on least squares, which can be implemented in parallel for multiple trajectories and is therefore well-suited for the high dimensional, big data regime. Numerical simulations on a variety examples, including opinion dynamics, predator-swarm dynamics and heterogeneous particle dynamics, suggest that the learnability condition is satisfied in models used in practice, and the rate of convergence of our estimator is consistent with the theory. These simulations also suggest that our estimators are robust to noise in the observations, and produce accurate predictions of dynamics in relative large time intervals, even when they are learned from data collected in short time intervals.
Intelligence via ultrafilters: structural properties of some intelligence comparators of deterministic Legg-Hutter agents
Legg and Hutter, as well as subsequent authors, considered intelligent agents through the lens of interaction with reward-giving environments, attempting to assign numeric intelligence measures to such agents, with the guiding principle that a more intelligent agent should gain higher rewards from environments in some aggregate sense. In this paper, we consider a related question: rather than measure numeric intelligence of one Legg- Hutter agent, how can we compare the relative intelligence of two Legg-Hutter agents? We propose an elegant answer based on the following insight: we can view Legg-Hutter agents as candidates in an election, whose voters are environments, letting each environment vote (via its rewards) which agent (if either) is more intelligent. This leads to an abstract family of comparators simple enough that we can prove some structural theorems about them. It is an open question whether these structural theorems apply to more practical intelligence measures.
Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination
Han, Dongge, Boehmer, Wendelin, Wooldridge, Michael, Rogers, Alex
In a multi-agent system, an agent's optimal policy will typically depend on the policies chosen by others. Therefore, a key issue in multi-agent systems research is that of predicting the behaviours of others, and responding promptly to changes in such behaviours. One obvious possibility is for each agent to broadcast their current intention, for example, the currently executed option in a hierarchical reinforcement learning framework. However, this approach results in inflexibility of agents if options have an extended duration and are dynamic. While adjusting the executed option at each step improves flexibility from a single-agent perspective, frequent changes in options can induce inconsistency between an agent's actual behaviour and its broadcast intention. In order to balance flexibility and predictability, we propose a dynamic termination Bellman equation that allows the agents to flexibly terminate their options. We evaluate our model empirically on a set of multi-agent pursuit and taxi tasks, and show that our agents learn to adapt flexibly across scenarios that require different termination behaviours.
Redistribution Mechanism Design on Networks
Zhang, Wen, Zhao, Dengji, Chen, Hanyu
Redistribution mechanisms have been proposed for more efficient resource allocation but not for profit. We consider redistribution mechanism design for the first time in a setting where participants are connected and the resource owner is only aware of her neighbours. In this setting, to make the resource allocation more efficient, the resource owner has to inform the others who are not her neighbours, but her neighbours do not want more participants to compete with them. Hence, the goal is to design a redistribution mechanism such that participants are incentivized to invite more participants and the resource owner does not earn or lose much money from the allocation. We first show that existing redistribution mechanisms cannot be directly applied in the network setting to achieve the goal. Then we propose a novel network-based redistribution mechanism such that all participants in the network are invited, the allocation is more efficient and the resource owner has no deficit. Introduction The problem of resource allocation has recently caught the public imagination, where the resource owner has to decide the allocation of the item among a group of self-interested agents. Since the valuation differs from agents, it is a natural objective for the owner to pursue the efficiency of the allocation, i.e., allocating the item to the agent with the highest valuation. In many scenarios, the owner does not really aim at making profits but hopes the wealth maintained among the agents. For example, the government wants to build a library in a community that values it most; a charity distributes a donation to the recipient who needs it most; a hospital allocates doctors to rural areas where doctors are highly demanded. To find the agent with the highest valuation, one common alternative is to hold an auction (Krishna 2009) under some protocols such as the well-known Vickrey-Clarke- Groves (VCG) mechanism (Vickrey 1961; Clarke 1971; Groves 1973). However, the payments under VCG will all be delivered to the auctioneer, which againsts our nonprofit purpose.
Autonomous Industrial Management via Reinforcement Learning: Self-Learning Agents for Decision-Making -- A Review
Leal, Leonardo A. Espinosa, Westerlund, Magnus, Chapman, Anthony
Industry has always been in the pursuit of becoming more economically efficient and the current focus has been to reduce human labour using modern technologies. Even with cutting edge technologies, which range from packaging robots to AI for fault detection, there is still some ambiguity on the aims of some new systems, namely, whether they are automated or autonomous. In this paper we indicate the distinctions between automated and autonomous system as well as review the current literature and identify the core challenges for creating learning mechanisms of autonomous agents. We discuss using different types of extended realities, such as digital twins, to train reinforcement learning agents to learn specific tasks through generalization. Once generalization is achieved, we discuss how these can be used to develop self-learning agents. We then introduce self-play scenarios and how they can be used to teach self-learning agents through a supportive environment which focuses on how the agents can adapt to different real-world environments.
Leverage AI to Create Autonomous Policies that Adapts without Human Intervention
Policies are the foundation for any successful organization. Policies are the rules, or laws, of an organization. Heck, one could argue that an organization's culture is better defined by its policies than it is by the character of its leadership team. Unfortunately, the management, creation and execution of policies haven't changed much since the days of "time-and-motion studies". In many cases, policies are nothing more than a static list of what-if rules that govern what workers are to do in well-defined situations.