Search
Learning search spaces for Bayesian optimization: Another view of hyperparameter transfer learning
Perrone, Valerio, Shen, Huibin, Seeger, Matthias, Archambeau, Cedric, Jenatton, Rodolphe
Bayesian optimization (BO) is a successful methodology to optimize black-box functions that are expensive to evaluate. While traditional methods optimize each black-box function in isolation, there has been recent interest in speeding up BO by transferring knowledge across multiple related black-box functions. In this work, we introduce a method to automatically design the BO search space by relying on evaluations of previous black-box functions. We depart from the common practice of defining a set of arbitrary search ranges a priori by considering search space geometries that are learned from historical data. This simple, yet effective strategy can be used to endow many existing BO methods with transfer learning properties. Despite its simplicity, we show that our approach considerably boosts BO by reducing the size of the search space, thus accelerating the optimization of a variety of black-box optimization problems. In particular, the proposed approach combined with random search results in a parameter-free, easy-to-implement, robust hyperparameter optimization strategy. We hope it will constitute a natural baseline for further research attempting to warm-start BO.
Interactive Decision Making for Autonomous Vehicles in Dense Traffic
Interactive Decision Making for Autonomous V ehicles in Dense Traffic David Isele 1 Abstract -- Dense urban traffic environments can produce situations where accurate prediction and dynamic models are insufficient for successful autonomous vehicle motion planning. We investigate how an autonomous agent can safely negotiate with other traffic participants, enabling the agent to handle potential deadlocks. Specifically we consider merges where the gap between cars is smaller than the size of the ego vehicle. We propose a game theoretic framework capable of generating and responding to interactive behaviors. Our main contribution is to show how game-tree decision making can be executed by an autonomous vehicle, including approximations and reasoning that make the tree-search computationally tractable. Additionally, to test our model we develop a stochastic rule-based traffic agent capable of generating interactive behaviors that can be used as a benchmark for simulating traffic participants in a crowded merge setting. I NTRODUCTION Much of the long tail around autonomous driving behavior relates to complex interactions between self-interested agents. Since other traffic participants exhibit a great deal of variety and are often neither purely adversarial, nor purely cooperative, it can be difficult to reason about their behavior. However this type of reasoning is essential to numerous traffic situations in congested traffic such as overcrowded merge scenarios depicted in Figure 1.
Action Selection for MDPs: Anytime AO* vs. UCT
In the presence of non-admissible heuristics, A* and other best-first algorithms can be converted into anytime optimal algorithms over OR graphs, by simply continuing the search after the first solution is found. The same trick, however, does not work for best-first algorithms over AND/OR graphs, that must be able to expand leaf nodes of the explicit graph that are not necessarily part of the best partial solution. Anytime optimal variants of AO* must thus address an exploration-exploitation tradeoff: they cannot just "exploit", they must keep exploring as well. In this work, we develop one such variant of AO* and apply it to finite-horizon MDPs. This Anytime AO* algorithm eventually delivers an optimal policy while using non-admissible random heuristics that can be sampled, as when the heuristic is the cost of a base policy that can be sampled with rollouts. We then test Anytime AO* for action selection over large infinite-horizon MDPs that cannot be solved with existing off-line heuristic search and dynamic programming algorithms, and compare it with UCT. Introduction One of the natural approaches for selecting actions in very large state spaces is by performing a limited amount of lookahead. In the contexts of discounted MDPs, Kearns, Mansour, and Ng have shown that near to optimal actions can be selected by considering a sampled lookahead tree that is sufficiently sparse, whose size depends on the discount factor and the suboptimality bound but not on the number of problem states (Kearns, Mansour, and Ng 1999).
A Quest for Structure: Jointly Learning the Graph Structure and Semi-Supervised Classification
Wu, Xuan, Zhao, Lingxiao, Akoglu, Leman
Semi-supervised learning (SSL) is effectively used for numerous classification problems, thanks to its ability to make use of abundant unlabeled data. The main assumption of various SSL algorithms is that the nearby points on the data manifold are likely to share a label. Graph-based SSL constructs a graph from point-cloud data as an approximation to the underlying manifold, followed by label inference. It is no surprise that the quality of the constructed graph in capturing the essential structure of the data is critical to the accuracy of the subsequent inference step [6]. How should one construct a graph from the input point-cloud data for graph-based SSL? In this work we introduce a new, parallel graph learning framework (called PG-learn) for the graph construction step of SSL. Our solution has two main ingredients: (1) a gradient-based optimization of the edge weights (more specifically, different kernel bandwidths in each dimension) based on a validation loss function, and (2) a parallel hyperparameter search algorithm with an adaptive resource allocation scheme. In essence, (1) allows us to search around a (random) initial hyperparameter configuration for a better one with lower validation loss. Since the search space of hyperparameters is huge for high-dimensional problems, (2) empowers our gradient-based search to go through as many different initial configurations as possible, where runs for relatively unpromising starting configurations are terminated early to allocate the time for others. As such, PG-learn is a carefully-designed hybrid of random and adaptive search. Through experiments on multi-class classification problems, we show that PG-learn significantly outperforms a variety of existing graph construction schemes in accuracy (per fixed time budget for hyperparameter tuning), and scales more effectively to high dimensional problems.
A Survey of Machine Learning Applied to Computer Architecture Design
Penney, Drew D., Chen, Lizhong
Machine learning has enabled significant benefits in diverse fields, but, with a few exceptions, has had limited impact on computer architecture. Recent work, however, has explored broader applicability for design, optimization, and simulation. Notably, machine learning based strategies often surpass prior state-of-the-art analytical, heuristic, and human-expert approaches. This paper reviews machine learning applied system-wide to simulation and run-time optimization, and in many individual components, including memory systems, branch predictors, networks-on-chip, and GPUs. The paper further analyzes current practice to highlight useful design strategies and identify areas for future work, based on optimized implementation strategies, opportune extensions to existing work, and ambitious long term possibilities. Taken together, these strategies and techniques present a promising future for increasingly automated architectural design.
Efficient Large-Scale Multi-Drone Delivery Using Transit Networks
Choudhury, Shushman, Solovey, Kiril, Kochenderfer, Mykel J., Pavone, Marco
We consider the problem of controlling a large fleet of drones to deliver packages simultaneously across broad urban areas. To conserve their limited flight range, drones can seamlessly hop between and ride on top of public transit vehicles (e.g., buses and trams). We design a novel comprehensive algorithmic framework that strives to minimize the maximum time to complete any delivery. We address the multifaceted complexity of the problem through a two-layer approach. First, the upper layer assigns drones to package delivery sequences with a provably near-optimal polynomial-time task allocation algorithm. Then, the lower layer executes the allocation by periodically routing the fleet over the transit network while employing efficient bounded-suboptimal multi-agent pathfinding techniques tailored to our setting. We present extensive experiments supporting the efficiency of our approach on settings with up to $200$ drones, $5000$ packages, and large transit networks of up to $8000$ stops in San Francisco and the Washington DC area. Our results show that the framework can compute solutions within a few seconds (up to $2$ minutes for the largest settings) on commodity hardware, and that drones travel up to $450 \%$ of their flight range by using public transit.
Improving SAT Solver Heuristics with Graph Networks and Reinforcement Learning
Kurin, Vitaly, Godil, Saad, Whiteson, Shimon, Catanzaro, Bryan
A BSTRACT We present GQSA T, a branching heuristic in a Boolean SA T solver trained with value-based reinforcement learning (RL) using Graph Neural Networks for function approximation. Solvers using GQSA T are complete SA T solvers that either provide a satisfying assignment or a proof of unsatisfiability, which is required for many SA T applications. The branching heuristic commonly used in SA T solvers today suffers from bad decisions during their warm-up period, whereas GQSA T has been trained to examine the structure of the particular problem instance to make better decisions at the beginning of the search. Training GQSA T is data efficient and does not require elaborate dataset preparation or feature engineering to train. We train GQSA T on small SA T problems using RL interfacing with an existing SA T solver. We show that GQSA T is able to reduce the number of iterations required to solve SA T problems by 2-3X, and it generalizes to unsatisfiable SA T instances, as well as to problems with 5X more variables than it was trained on. We also show that, to a lesser extent, it generalizes to SA T problems from different domains by evaluating it on graph coloring. Our experiments show that augmenting SA T solvers with agents trained with RL and graph neural networks can improve performance on the SA T search problem. 1 I NTRODUCTION Boolean satisfiability (SA T) is an important problem for both industry and academia impacting various fields, including circuit design, computer security, artificial intelligence, automatic theorem proving, and combinatorial optimization. As a result, modern SA T solvers are well-crafted, sophisticated, reliable pieces of software that can scale to problems with hundreds of thousands of variables (Ohrimenko et al., 2009). SA T is known to be NPcomplete (Karp, 1972), and most state-of-the-art open-source and commercial solvers rely on multiple heuristics to speed up the exhaustive search, which is otherwise intractable. These heuristics are usually meticulously crafted using expert domain knowledge and are often iterated on using trial and error. In this paper, we investigate how we can use machine learning to improve upon an existing branching heuristic without leveraging domain expertise.
Online Non-Monotone DR-submodular Maximization
Thang, Nguyen Kim, Srivastav, Abhinav
In this paper, we study problems at the interface of two important fields: \emph{submodular optimization} and \emph{online learning}. Submodular functions play a vital role in modelling cost functions that naturally arise in many areas of discrete optimization. These functions have been studied under various models of computation. Independently, submodularity has been considered in continuous domains. In fact, many problems arising in machine learning and statistics have been modelled using continuous DR-submodular functions. In this work, we are study the problem of maximizing \textit{non-monotone} continuous DR-submodular functions within the framework of online learning. We provide three main results. First, we present an online algorithm (in full-information setting) that achieves an approximation guarantee (depending on the search space) for the problem of maximizing non-monotone continuous DR-submodular functions over a \emph{general} convex domain. To best of our knowledge, no prior approximation algorithm in full-information setting was known for the non-monotone continuous DR submodular functions even for the \emph{down-closed} convex domain. Second, we show that the online stochastic mirror ascent algorithm (in full information setting) achieves an improved approximation ratio of $(1/4)$ for maximizing the non-monotone continuous DR-submodular functions over a \emph{down-closed} convex domain. At last, we extend our second result to the bandit setting where we present the first approximation guarantee of $(1/4)$. To best of our knowledge, no approximation algorithm for non-monotone submodular maximization was known in the bandit setting.
An Extensible and Personalizable Multi-Modal Trip Planner
Liu, Xudong, Fritz, Christian, Klenk, Matthew
Despite a tremendous amount of work in the literature and in the commercial sectors, current approaches to multi-modal trip planning still fail to consistently generate plans that users deem optimal in practice. We believe that this is due to the fact that current planners fail to capture the true preferences of users, e.g., their preferences depend on aspects that are not modeled. An example of this could be a preference not to walk through an unsafe area at night. We present a novel multi-modal trip planner that allows users to upload auxiliary geographic data (e.g., crime rates) and to specify temporal constraints and preferences over these data in combination with typical metrics such as time and cost. Concretely, our planner supports the modes walking, biking, driving, public transit, and taxi, uses linear temporal logic to capture temporal constraints, and preferential cost functions to represent preferences. We show by examples that this allows the expression of very interesting preferences and constraints that, naturally, lead to quite diverse optimal plans.
Temporal Planning with Intermediate Conditions and Effects
Valentini, Alessandro, Micheli, Andrea, Cimatti, Alessandro
Automated temporal planning is the technology of choice when controlling systems that can execute more actions in parallel and when temporal constraints, such as deadlines, are needed in the model. One limitation of several action-based planning systems is that actions are modeled as intervals having conditions and effects only at the extremes and as invariants, but no conditions nor effects can be specified at arbitrary points or sub-intervals. In this paper, we address this limitation by providing an effective heuristic-search technique for temporal planning, allowing the definition of actions with conditions and effects at any arbitrary time within the action duration. We experimentally demonstrate that our approach is far better than standard encodings in PDDL 2.1 and is competitive with other approaches that can (directly or indirectly) represent intermediate action conditions or effects.