Search
ACE: An Actor Ensemble Algorithm for Continuous Control with Tree Search
Zhang, Shangtong, Chen, Hao, Yao, Hengshuai
In this paper, we propose an actor ensemble algorithm, named ACE, for continuous control with a deterministic policy in reinforcement learning. In ACE, we use actor ensemble (i.e., multiple actors) to search the global maxima of the critic. Besides the ensemble perspective, we also formulate ACE in the option framework by extending the option-critic architecture with deterministic intra-option policies, revealing a relationship between ensemble and options. Furthermore, we perform a look-ahead tree search with those actors and a learned value prediction model, resulting in a refined value estimation. We demonstrate a significant performance boost of ACE over DDPG and its variants in challenging physical robot simulators.
Adaptive Stress Testing: Finding Failure Events with Reinforcement Learning
Lee, Ritchie, Mengshoel, Ole J., Saksena, Anshu, Gardner, Ryan, Genin, Daniel, Silbermann, Joshua, Owen, Michael, Kochenderfer, Mykel J.
Finding the most likely path to a set of failure states is important to the analysis of safety-critical dynamic systems. While efficient solutions exist for certain classes of systems, a scalable general solution for stochastic, partially-observable, and continuous-valued systems remains challenging. Existing approaches in formal and simulation-based methods either cannot scale to large systems or are computationally inefficient. This paper presents adaptive stress testing (AST), a framework for searching a simulator for the most likely path to a failure event. We formulate the problem as a Markov decision process and use reinforcement learning to optimize it. The approach is simulation-based and does not require internal knowledge of the system. As a result, the approach is very suitable for black box testing of large systems. We present formulations for both systems where the state is fully-observable and partially-observable. In the latter case, we present a modified Monte Carlo tree search algorithm that only requires access to the pseudorandom number generator of the simulator to overcome partial observability. We also present an extension of the framework, called differential adaptive stress testing (DAST), that can be used to find failures that occur in one system but not in another. This type of differential analysis is useful in applications such as regression testing, where one is concerned with finding areas of relative weakness compared to a baseline. We demonstrate the effectiveness of the approach on an aircraft collision avoidance application, where we stress test a prototype aircraft collision avoidance system to find high-probability scenarios of near mid-air collisions.
Towards a Near Universal Time Series Data Mining Tool: Introducing the Matrix Profile
Towards a Near Universal Time Series Data Mining Tool: Introducing the Matrix Profile by Chin-Chia Michael Yeh Doctor of Philosophy, Graduate Program in Computer Science University of California, Riverside, September 2018 Dr. Eamonn Keogh, Chairperson The last decade has seen a flurry of research on all-pairs-similarity-search (or, self-join) for text, DNA, and a handful of other datatypes, and these systems have been applied to many diverse data mining problems. Surprisingly, however, little progress has been made on addressing this problem for time series subsequences. In this thesis, we have introduced a near universal time series data mining tool called matrix profile which solves the all-pairssimilarity-search problem and caches the output in an easy-to-access fashion. The proposed algorithm is not only parameter-free, exact and scalable, but also applicable for both single and multidimensional time series. By building time series data mining methods on top of matrix profile, many time series data mining tasks (e.g., motif discovery, discord discovery, shapelet discovery, semantic segmentation, and clustering) can be efficiently solved. Because the same matrix profile can be shared by a diverse set of time series data mining methods, matrix profile is versatile and computed-once-use-many-times data structure. We demonstrate the utility of matrix profile for many time series data mining problems, including motif discovery, discord discovery, weakly labeled time series classification, and vi representation learning on domains as diverse as seismology, entomology, music processing, bioinformatics, human activity monitoring, electrical power-demand monitoring, and medicine. We hope the matrix profile is not the end but the beginning of many more time series data mining projects.
Deep Optimisation: Solving Combinatorial Optimisation Problems using Deep Neural Networks
Caldwell, J. R., Watson, R. A., Thies, C., Knowles, J. D.
Deep Optimisation (DO) combines evolutionary search with Deep Neural Networks (DNNs) in a novel way - not for optimising a learning algorithm, but for finding a solution to an optimisation problem. Deep learning has been successfully applied to classification, regression, decision and generative tasks and in this paper we extend its application to solving optimisation problems. Model Building Optimisation Algorithms (MBOAs), a branch of evolutionary algorithms, have been successful in combining machine learning methods and evolutionary search but, until now, they have not utilised DNNs. DO is the first algorithm to use a DNN to learn and exploit the problem structure to adapt the variation operator (changing the neighbourhood structure of the search process). We demonstrate the performance of DO using two theoretical optimisation problems within the MAXSAT class. The Hierarchical Transformation Optimisation Problem (HTOP) has controllable deep structure that provides a clear evaluation of how DO works and why using a layerwise technique is essential for learning and exploiting problem structure. The Parity Modular Constraint Problem (MCparity) is a simplistic example of a problem containing higher-order dependencies (greater than pairwise) which DO can solve and state of the art MBOAs cannot. Further, we show that DO can exploit deep structure in TSP instances. Together these results show that there exists problems that DO can find and exploit deep problem structure that other algorithms cannot. Making this connection between DNNs and optimisation allows for the utilisation of advanced tools applicable to DNNs that current MBOAs are unable to use.
eLIAN: Enhanced Algorithm for Angle-constrained Path Finding
Andreychuk, Anton, Soboleva, Natalia, Yakovlev, Konstantin
Problem of finding 2D paths of special shape, e.g. paths comprised of line segments having the property that the angle between any two consecutive segments does not exceed the predefined threshold, is considered in the paper. This problem is harder to solve than the one when shortest paths of any shape are sought, since the planer's search space is substantially bigger as multiple search nodes corresponding to the same location need to be considered. One way to reduce the search effort is to fix the length of the path's segment and to prune the nodes that violate the imposed constraint. This leads to incompleteness and to the sensitivity of the 's performance to chosen parameter value. In this work we introduce a novel technique that reduces this sensitivity by automatically adjusting the length of the path's segment on-the-fly, e.g. during the search. Embedding this technique into the known grid-based angle-constrained path finding algorithm - LIAN, leads to notable increase of the planner's effectiveness, e.g. success rate, while keeping efficiency, e.g. runtime, overhead at reasonable level. Experimental evaluation shows that LIAN with the suggested enhancements, dubbed eLIAN, solves up to 20\% of tasks more compared to the predecessor. Meanwhile, the solution quality of eLIAN is nearly the same as the one of LIAN.
Should Algorithms for Random SAT and Max-SAT be Different?
We analyze to what extent the random SAT and Max-SAT problems differ in their properties. Our findings suggest that for random $k$-CNF with ratio in a certain range, Max-SAT can be solved by any SAT algorithm with subexponential slowdown, while for formulae with ratios greater than some constant, algorithms under the random walk framework require substantially different heuristics. In light of these results, we propose a novel probabilistic approach for random Max-SAT called ProMS. Experimental results illustrate that ProMS outperforms many state-of-the-art local search solvers on random Max-SAT benchmarks.
Learning Beam Search Policies via Imitation Learning
Negrinho, Renato, Gormley, Matthew R., Gordon, Geoffrey J.
Beam search is widely used for approximate decoding in structured prediction problems. Models often use a beam at test time but ignore its existence at train time, and therefore do not explicitly learn how to use the beam. We develop an unifying meta-algorithm for learning beam search policies using imitation learning. In our setting, the beam is part of the model, and not just an artifact of approximate decoding. Our meta-algorithm captures existing learning algorithms and suggests new ones. It also lets us show novel no-regret guarantees for learning beam search policies.
Scalable End-to-End Autonomous Vehicle Testing via Rare-event Simulation
O'Kelly, Matthew, Sinha, Aman, Namkoong, Hongseok, Duchi, John, Tedrake, Russ
Recent breakthroughs in deep learning have accelerated the development of autonomous vehicles (AVs); many research prototypes now operate on real roads alongside human drivers. While advances in computer-vision techniques have made human-level performance possible on narrow perception tasks such as object recognition, several fatal accidents involving AVs underscore the importance of testing whether the perception and control pipeline--when considered as a whole system--can safely interact with humans. Unfortunately, testing AVs in real environments, the most straightforward validation framework for system-level input-output behavior, requires prohibitive amounts of time due to the rare nature of serious accidents [49]. Concretely, a recent study [29] argues that AVs need to drive "hundreds of millions of miles and, under some scenarios, hundreds of billions of miles to create enough data to clearly demonstrate their safety." Alteratively, formally verifying an AV algorithm's "correctness" [34, 2, 47, 37] is difficult since all driving policies are subject to crashes caused by other drivers [49]. It is unreasonable to ask that the policy be safe under all scenarios. Unfortunately, ruling out scenarios where the AV should not be blamed is a task subject to logical inconsistency, combinatorial growth in specification complexity, and subjective assignment of fault. Motivated by the challenges underlying real-world testing and formal verification, we consider a probabilistic paradigm--which we call a risk-based framework--where our goal is to evaluate the probability of an accident under a base distribution representing standard traffic behavior.
Taking Human out of Learning Applications: A Survey on Automated Machine Learning
Quanming, Yao, Mengshuo, Wang, Hugo, Jair Escalante, Isabelle, Guyon, Yi-Qi, Hu, Yu-Feng, Li, Wei-Wei, Tu, Qiang, Yang, Yang, Yu
Machine learning techniques have deeply rooted in our everyday life. However, since it is knowledge- and labor-intensive to pursuit good learning performance, human experts are heavily engaged in every aspect of machine learning. In order to make machine learning techniques easier to apply and reduce the demand for experienced human experts, automatic machine learning~(AutoML) has emerged as a hot topic of both in industry and academy. In this paper, we provide a survey on existing AutoML works. First, we introduce and define the AutoML problem, with inspiration from both realms of automation and machine learning. Then, we propose a general AutoML framework that not only covers almost all existing approaches but also guides the design for new methods. Afterward, we categorize and review the existing works from two aspects, i.e., the problem setup and the employed techniques. Finally, we provide a detailed analysis of AutoML approaches and explain the reasons underneath their successful applications. We hope this survey can serve as not only an insightful guideline for AutoML beginners but also an inspiration for future researches.
Differentiable Greedy Networks
Powers, Thomas, Fakoor, Rasool, Shakeri, Siamak, Sethy, Abhinav, Kainth, Amanjit, Mohamed, Abdel-rahman, Sarikaya, Ruhi
Optimal selection of a subset of items from a given set is a hard problem that requires combinatorial optimization. In this paper, we propose a subset selection algorithm that is trainable with gradient based methods yet achieves near optimal performance via submodular optimization. We focus on the task of identifying a relevant set of sentences for claim verification in the context of the FEVER task. Conventional methods for this task look at sentences on their individual merit and thus do not optimize the informativeness of sentences as a set. We show that our proposed method which builds on the idea of unfolding a greedy algorithm into a computational graph allows both interpretability and gradient based training. The proposed differentiable greedy network (DGN) outperforms discrete optimization algorithms as well as other baseline methods in terms of precision and recall. In this paper, we develop a subset selection algorithm that is differentiable and discrete, which can be trained on supervised data and can model complex dependencies between elements in a straightforward and comprehensible way. This is of particular interest in natural language processing tasks such as fact extraction, fact verification, and question answering where the proposed optimization scheme can be used for evidence retrieval.