Goto

Collaborating Authors

 Agents


RLCache: Automated Cache Management Using Reinforcement Learning

arXiv.org Machine Learning

This study investigates the use of reinforcement learning to guide a general purpose cache manager decisions. Cache managers directly impact the overall performance of computer systems. They govern decisions about which objects should be cached, the duration they should be cached for, and decides on which objects to evict from the cache if it is full. These three decisions impact both the cache hit rate and size of the storage that is needed to achieve that cache hit rate. An optimal cache manager will avoid unnecessary operations, maximise the cache hit rate which results in fewer round trips to a slower backend storage system, and minimise the size of storage needed to achieve a high hit-rate. This project investigates using reinforcement learning in cache management by designing three separate agents for each of the cache manager tasks. Furthermore, the project investigates two advanced reinforcement learning architectures for multi-decision problems: a single multi-task agent and a multi-agent. We also introduce a framework to simplify the modelling of computer systems problems as a reinforcement learning task. The framework abstracts delayed experiences observations and reward assignment in computer systems while providing a flexible way to scale to multiple agents. Simulation results based on an established database benchmark system show that reinforcement learning agents can achieve a higher cache hit rate over heuristic driven algorithms while minimising the needed space. They are also able to adapt to a changing workload and dynamically adjust their caching strategy accordingly. The proposed cache manager model is generic and applicable to other types of caches, such as file system caches. This project is the first, to our knowledge, to model cache manager decisions as a multi-task control problem.


Optimal Algorithms for Submodular Maximization with Distributed Constraints

arXiv.org Machine Learning

Optimal Algorithms for Submodular Maximization with Distributed Constraints Alexander Robey, Arman Adibi, Brent Schlotfeldt, George J. Pappas, and Hamed Hassani Abstract -- We consider a class of discrete optimization problems that aim to maximize a submodular objective function subject to a distributed partition matroid constraint. More precisely, we consider a networked scenario in which multiple agents choose actions from local strategy sets with the goal of maximizing a submodular objective function defined over the set of all possible actions. Given this distributed setting, we develop Constraint-Distributed Continuous Greedy ( CDCG), a message passing algorithm that converges to the tight (1 1 /e) approximation factor of the optimum global solution using only local computation and communication. It is known that a sequential greedy algorithm can only achieve a 1 /2 multiplicative approximation of the optimal solution for this class of problems in the distributed setting. Our framework relies on lifting the discrete problem to a continuous domain and developing a consensus algorithm that achieves the tight (1 1 /e) approximation guarantee of the global discrete solution once a proper rounding scheme is applied. We also offer empirical results from a multi-agent area coverage problem to show that the proposed method significantly outperforms the state-of-the-art sequential greedy method. I. INTRODUCTION Recently, the need has arisen to design algorithms that distribute decision making among a collection of agents or computing devices. This need has been motivated by problems from statistics, machine learning and robotics. These problems include: - (Density estimation) What is the best way to estimate a nonparametric density function from a distributed dataset? Inherent to all of these applications is an underlying optimization problem that can be expressed as maximize f (S) (1a) subject to S Y, S I (1b) where f is a submodular function (i.e. it has a diminishing-returns property), Y is a finite ground set of all decision variables, and I is a family of allowable subsets of Y .


Deep Coordination Graphs

arXiv.org Artificial Intelligence

This paper introduces the deep coordination graph (DCG) for collaborative multi-agent reinforcement learning. DCG strikes a flexible trade-off between representational capacity and generalization by factorizing the joint value function of all agents according to a coordination graph into payoffs between pairs of agents. The value can be maximized by local message passing along the graph, which allows training of the value function end-to-end with Q-learning. Payoff functions are approximated with deep neural networks and parameter sharing improves generalization over the state-action space. We show that DCG can solve challenging predator-prey tasks that are vulnerable to the relative overgeneralization pathology and in which all other known value factorization approaches fail.


Interaction-Aware Multi-Agent Reinforcement Learning for Mobile Agents with Individual Goals

arXiv.org Artificial Intelligence

-- In a multi-agent setting, the optimal policy of a single agent is largely dependent on the behavior of other agents. We investigate the problem of multi-agent reinforcement learning, focusing on decentralized learning in non-stationary domains for mobile robot navigation. We identify a cause for the difficulty in training non-stationary policies: mutual adaptation to sub-optimal behaviors, and we use this to motivate a curriculum-based strategy for learning interactive policies. The curriculum has two stages. First, the agent leverages policy gradient algorithms to learn a policy that is capable of achieving multiple goals. Second, the agent learns a modifier policy to learn how to interact with other agents in a multi-agent setting. We evaluated our approach on both an autonomous driving lane-change domain and a robot navigation domain. Single agent reinforcement learning (RL) algorithms have made significant progress in game playing [20] and robotics [13], however, single agent learning algorithms in multi-agent settings are prone to learn stereotyped behaviors that over-fit to the training environment [22], [15]. There are several reasons why multi-agent environments are more difficult: 1) interacting with an unknown agent requires having either multiple responses to a given situation or a more nuanced ability to perceive differences. The former breaks the Markov assumption, the latter rules out simpler solutions which are likely to be found first.


Interactive Decision Making for Autonomous Vehicles in Dense Traffic

arXiv.org Artificial Intelligence

Interactive Decision Making for Autonomous V ehicles in Dense Traffic David Isele 1 Abstract -- Dense urban traffic environments can produce situations where accurate prediction and dynamic models are insufficient for successful autonomous vehicle motion planning. We investigate how an autonomous agent can safely negotiate with other traffic participants, enabling the agent to handle potential deadlocks. Specifically we consider merges where the gap between cars is smaller than the size of the ego vehicle. We propose a game theoretic framework capable of generating and responding to interactive behaviors. Our main contribution is to show how game-tree decision making can be executed by an autonomous vehicle, including approximations and reasoning that make the tree-search computationally tractable. Additionally, to test our model we develop a stochastic rule-based traffic agent capable of generating interactive behaviors that can be used as a benchmark for simulating traffic participants in a crowded merge setting. I NTRODUCTION Much of the long tail around autonomous driving behavior relates to complex interactions between self-interested agents. Since other traffic participants exhibit a great deal of variety and are often neither purely adversarial, nor purely cooperative, it can be difficult to reason about their behavior. However this type of reasoning is essential to numerous traffic situations in congested traffic such as overcrowded merge scenarios depicted in Figure 1.


A Generalized Training Approach for Multiagent Learning

arXiv.org Artificial Intelligence

This paper investigates a population-based training regime based on game-theoretic principles called Policy-Spaced Response Oracles (PSRO). PSRO is general in the sense that it (1) encompasses well-known algorithms such as fictitious play and double oracle as special cases, and (2) in principle applies to general-sum, many-player games. Despite this, prior studies of PSRO have been focused on two-player zero-sum games, a regime wherein Nash equilibria are tractably computable. In moving from two-player zero-sum games to more general settings, computation of Nash equilibria quickly becomes infeasible. Here, we extend the theoretical underpinnings of PSRO by considering an alternative solution concept, {\alpha}-Rank, which is unique (thus faces no equilibrium selection issues, unlike Nash) and tractable to compute in general-sum, many-player settings. We establish convergence guarantees in several games classes, and identify links between Nash equilibria and {\alpha}-Rank. We demonstrate the competitive performance of {\alpha}-Rank-based PSRO against an exact Nash solver-based PSRO in 2-player Kuhn and Leduc Poker. We then go beyond the reach of prior PSRO applications by considering 3- to 5-player poker games, yielding instances where {\alpha}-Rank achieves faster convergence than approximate Nash solvers, thus establishing it as a favorable general games solver. We also carry out an initial empirical validation in MuJoCo soccer, illustrating the feasibility of the proposed approach in another complex domain.


Multi-Agent Actor-Critic with Hierarchical Graph Attention Network

arXiv.org Artificial Intelligence

Most previous studies on multi-agent reinforcement learning focus on deriving decentralized and cooperative policies to maximize a common reward and rarely consider the transferability of trained policies to new tasks. This prevents such policies from being applied to more complex multi-agent tasks. To resolve these limitations, we propose a model that conducts both representation learning for multiple agents using hierarchical graph attention network and policy learning using multi-agent actor-critic. The hierarchical graph attention network is specially designed to model the hierarchical relationships among multiple agents that either cooperate or compete with each other to derive more advanced strategic policies. Two attention networks, the inter-agent and inter-group attention layers, are used to effectively model individual and group level interactions, respectively. The two attention networks have been proven to facilitate the transfer of learned policies to new tasks with different agent compositions and allow one to interpret the learned strategies. Empirically, we demonstrate that the proposed model outperforms existing methods in several mixed cooperative and competitive tasks.


A Survey of Machine Learning Applied to Computer Architecture Design

arXiv.org Artificial Intelligence

Machine learning has enabled significant benefits in diverse fields, but, with a few exceptions, has had limited impact on computer architecture. Recent work, however, has explored broader applicability for design, optimization, and simulation. Notably, machine learning based strategies often surpass prior state-of-the-art analytical, heuristic, and human-expert approaches. This paper reviews machine learning applied system-wide to simulation and run-time optimization, and in many individual components, including memory systems, branch predictors, networks-on-chip, and GPUs. The paper further analyzes current practice to highlight useful design strategies and identify areas for future work, based on optimized implementation strategies, opportune extensions to existing work, and ambitious long term possibilities. Taken together, these strategies and techniques present a promising future for increasingly automated architectural design.


Superintelligence Safety: A Requirements Engineering Perspective

arXiv.org Artificial Intelligence

Under the headline "AI safety", a wide-reaching issue is being discussed, whether in the future some "superhuman artificial intelligence" / "superintelligence" could could pose a threat to humanity. In addition, the late Steven Hawking warned that the rise of robots may be disastrous for mankind. A major concern is that even benevolent superhuman artificial intelligence (AI) may become seriously harmful if its given goals are not exactly aligned with ours, or if we cannot specify precisely its objective function. Metaphorically, this is compared to king Midas in Greek mythology, who expressed the wish that everything he touched should turn to gold, but obviously this wish was not specified precisely enough. In our view, this sounds like requirements problems and the challenge of their precise formulation. (To our best knowledge, this has not been pointed out yet.) As usual in requirements engineering (RE), ambiguity or incompleteness may cause problems. In addition, the overall issue calls for a major RE endeavor, figuring out the wishes and the needs with regard to a superintelligence, which will in our opinion most likely be a very complex software-intensive system based on AI. This may even entail theoretically defining an extended requirements problem.


Synergistic Team Composition: A Computational Approach to Foster Diversity in Teams

arXiv.org Artificial Intelligence

Cooperative learning in heterogeneous teams refers to learning methods in which teams are organised both to accomplish academic tasks and for individuals to gain knowledge. Competencies, personality and the gender of team members are key factors that influence team performance. Here, we introduce a team composition problem, the so-called synergistic team composition problem (STCP), which incorporates such key factors when arranging teams. Thus, the goal of the STCP is to partition a set of individuals into a set of synergistic teams: teams that are diverse in personality and gender and whose members cover all required competencies to complete a task. Furthermore, the STCP requires that all teams are balanced in that they are expected to exhibit similar performances when completing the task. We propose two efficient algorithms to solve the STCP . Our first algorithm is based on a linear programming formulation and is appropriate to solve small instances of the problem. Our second algorithm is an anytime heuristic that is effective for large instances of the STCP . Finally, we thoroughly study the computational properties of both algorithms in an educational context when grouping students in a classroom into teams using actual-world data. Keywords: team composition, exact algorithms, heuristic algorithms, optimisation, coalition formation 1. Introduction Active learning refers to a broad range of teaching techniques that engage students to participate in all learning activities in the classes. Typically, active learning strategies involve a substantial amount of students working together within teams. They do not only acquire and retain the information better but also are more content with their classes [2]. Nevertheless, not all teams facilitate learning. For team-based learning to be effective, every team composed in the classroom needs to be heterogeneous, i.e. diverse in individuals' characteristics. Furthermore, having some significantly weaker teams and some significantly stronger teams is undesirable. Hence, the distribution of teams in a classroom must be balanced in the sense that all teams are more or less equally strong. Even though much research in the industrial, organisational, and educational psychology fields investigated what are the predictors of team success, to the best of our knowledge, there are no computational models to build teams for a given task that are broadly used in the classrooms. Frequently studied individual characteristics that influence team performance are competencies, personality traits, and gender [3, 4, 5, 6]. Some of those characteristics were also acknowledged by multiagent systems (MAS) research. The most studied characteristic in MAS research are competencies [7, 8, 9, 10, 11].