Search
On the Effect of Learned Clauses on Stochastic Local Search
Lorenz, Jan-Hendrik, Wörz, Florian
There are two competing paradigms in successful SAT solvers: Conflict-driven clause learning (CDCL) and stochastic local search (SLS). CDCL uses systematic exploration of the search space and has the ability to learn new clauses. SLS examines the neighborhood of the current complete assignment. Unlike CDCL, it lacks the ability to learn from its mistakes. This work revolves around the question whether it is beneficial for SLS to add new clauses to the original formula. We experimentally demonstrate that clauses with a large number of correct literals w. r. t. a fixed solution are beneficial to the runtime of SLS. We call such clauses high-quality clauses. Empirical evaluations show that short clauses learned by CDCL possess the high-quality attribute. We study several domains of randomly generated instances and deduce the most beneficial strategies to add high-quality clauses as a preprocessing step. The strategies are implemented in an SLS solver, and it is shown that this considerably improves the state-of-the-art on randomly generated instances. The results are statistically significant.
On the Optimality of Randomization in Experimental Design: How to Randomize for Minimax Variance and Design-Based Inference
I study the minimax-optimal design for a two-arm controlled experiment where conditional mean outcomes may vary in a given set. When this set is permutation symmetric, the optimal design is complete randomization, and using a single partition (i.e., the design that only randomizes the treatment labels for each side of the partition) has minimax risk larger by a factor of n 1. More generally, the optimal design is shown to be the mixed-strategy optimal design (MSOD) of Kallus (2018). Notably, even when the set of conditional mean outcomes has structure (i.e., is not permutation symmetric), being minimax-optimal for variance still requires randomization beyond a single partition. Nonetheless, since this targets precision, it may still not ensure sufficient uniformity in randomization to enable randomization (i.e., design-based) inference by Fisher's exact test to appropriately detect violations of null. I therefore propose the inferenceconstrained MSOD, which is minimax-optimal among all designs subject to such uniformity constraints. On the way, I discuss Johansson et al. (2020) who recently compared rerandomization of Morgan and Rubin (2012) and the pure-strategy optimal design (PSOD) of Kallus (2018). I point out some errors therein and set straight that randomization is minimax-optimal and that the "no free lunch" theorem and example in Kallus (2018) are correct. Keywords: Causal inference, controlled experiments, covariate balance, minimax, optimization.
Learning, transferring, and recommending performance knowledge with Monte Carlo tree search and neural networks
Making changes to a program to optimize its performance is an unscalable task that relies entirely upon human intuition and experience. In addition, companies operating at large scale are at a stage where no single individual understands the code controlling its systems, and for this reason, making changes to improve performance can become intractably difficult. In this paper, a learning system is introduced that provides AI assistance for finding recommended changes to a program. Specifically, it is shown how the evaluative feedback, delayed-reward performance programming domain can be effectively formulated via the Monte Carlo tree search (MCTS) framework. It is then shown that established methods from computational games for using learning to expedite tree-search computation can be adapted to speed up computing recommended program alterations. Estimates of expected utility from MCTS trees built for previous problems are used to learn a sampling policy that remains effective across new problems, thus demonstrating transferability of optimization knowledge. This formulation is applied to the Apache Spark distributed computing environment, and a preliminary result is observed that the time required to build a search tree for finding recommendations is reduced by up to a factor of 10x.
Generalized Planning With Deep Reinforcement Learning
Rivlin, Or, Hazan, Tamir, Karpas, Erez
Classical Planning is concerned with finding plans, or sequences of actions, that when applied to some initial condition specified by a set of logical predicates, will bring the environment to a state that satisfies a set of goal predicates. This is usually performed by some heuristic search procedure, and the resulting plan is applicable only to the specific instance that was solved. However, a possibly stronger outcome would be to find some sort of higher level plan that can solve many instances that belong to the same domain, and thus share an underlying structure. The study of methods that can discover such higher level plans is called Generalized Planning. Generalized plans do not necessarily exist for all classical planning domains, but finding such solutions for domains in which it is possible could obviate the need to perform compute intensive search in cases where we only wish to find a goal satisfying solution. To give an example of such a generalized plan, let us consider a simplified Blocksworld domain. In this domain there are unique blocks that can be either stacked on each other or strewn about the floor, and the goal is to stack and unstack blocks such that we arrive at a goal configuration from an initial configuration. Finding a plan that does so in an optimal number of steps is generally NPhard [10], but finding a plan that satisfies the goal regardless of cost can be done in polynomial time in the following manner: 1. Unstack all the blocks so that they are scattered on the floor 2. stack the block according to the goal configuration, beginning with the lower blocks This strategy is not optimal since we might unstack blocks that are already in their proper place according to the goal specification, but it will yield a goal satisfying plan for every instance in this simplified Blocksworld domain.
A Heuristic Based on Randomized Greedy Algorithms for the Clustered Shortest-Path Tree Problem
Thanh, Pham Dinh, Binh, Huynh Thi Thanh, Dac, Do Dinh, Long, Nguyen Binh, Phong, Le Minh Hai
Randomized Greedy Algorithms (RGAs) are interesting approaches to solve problems whose structures are not well understood as well as problems in combinatorial optimization which incorporate the random processes and the greedy algorithms. This paper introduces a new algorithm that combines the major features of RGAs and Shortest Path Tree Algorithm (SPTA) to deal with the Clustered Shortest-Path Tree Problem (CluSPT). In our algorithm, SPTA is used to determine the shortest path tree in each cluster while the combination between characteristics of the RGAs and search strategy of SPTA is used to constructed the edges connecting clusters. To evaluate the performance of the proposed algorithm, Euclidean benchmarks are selected. The experimental investigations show the strengths of the proposed algorithm in comparison with some existing algorithms. We also analyze the influence of the parameters on the performance of the algorithm.
ASNets: Deep Learning for Generalised Planning
Toyer, Sam (UC Berkeley) | Thiébaux, Sylvie (Australian National University) | Trevizan, Felipe (Australian National University) | Xie, Lexing (Australian National University)
In this paper, we discuss the learning of generalised policies for probabilistic and classical planning problems using Action Schema Networks (ASNets). The ASNet is a neural network architecture that exploits the relational structure of (P)PDDL planning problems to learn a common set of weights that can be applied to any problem in a domain. By mimicking the actions chosen by a traditional, non-learning planner on a handful of small problems in a domain, ASNets are able to learn a generalised reactive policy that can quickly solve much larger instances from the domain. This work extends the ASNet architecture to make it more expressive, while still remaining invariant to a range of symmetries that exist in PPDDL problems. We also present a thorough experimental evaluation of ASNets, including a comparison with heuristic search planners on seven probabilistic and deterministic domains, an extended evaluation on over 18,000 Blocksworld instances, and an ablation study. Finally, we show that sparsity-inducing regularisation can produce ASNets that are compact enough for humans to understand, yielding insights into how the structure of ASNets allows them to generalise across a domain.
Time Efficiency in Optimization with a Bayesian-Evolutionary Algorithm
Lan, Gongjin, Tomczak, Jakub M., Roijers, Diederik M., Eiben, A. E.
Not all generate-and-test search algorithms are created equal. Bayesian Optimization (BO) invests a lot of computation time to generate the candidate solution that best balances the predicted value and the uncertainty given all previous data, taking increasingly more time as the number of evaluations performed grows. Evolutionary Algorithms (EA) on the other hand rely on search heuristics that typically do not depend on all previous data and can be done in constant time. Both the BO and EA community typically assess their performance as a function of the number of evaluations. However, this is unfair once we start to compare the efficiency of these classes of algorithms, as the overhead times to generate candidate solutions are significantly different. We suggest to measure the efficiency of generate-and-test search algorithms as the expected gain in the objective value per unit of computation time spent. We observe that the preference of an algorithm to be used can change after a number of function evaluations. We therefore propose a new algorithm, a combination of Bayesian optimization and an Evolutionary Algorithm, BEA for short, that starts with BO, then transfers knowledge to an EA, and subsequently runs the EA. We compare the BEA with BO and the EA. The results show that BEA outperforms both BO and the EA in terms of time efficiency, and ultimately leads to better performance on well-known benchmark objective functions with many local optima. Moreover, we test the three algorithms on nine test cases of robot learning problems and here again we find that BEA outperforms the other algorithms.
On Learning Combinatorial Patterns to Assist Large-Scale Airline Crew Pairing Optimization
Aggarwal, Divyam, Singh, Yash Kumar, Saxena, Dhish Kumar
Airline Crew Pairing Optimization (CPO) aims at generating a set of legal flight sequences (crew pairings), to cover an airline's flight schedule, at minimum cost. It is usually performed using Column Generation (CG), a mathematical programming technique for guided search-space exploration. CG exploits the interdependencies between the current and the preceding CG-iteration for generating new variables (pairings) during the optimization-search. However, with the unprecedented scale and complexity of the emergent flight networks, it has become imperative to learn higher-order interdependencies among the flight-connection graphs, and utilize those to enhance the efficacy of the CPO. In first of its kind and what marks a significant departure from the state-of-the-art, this paper proposes a novel adaptation of the Variational Graph Auto-Encoder for learning plausible combinatorial patterns among the flight-connection data obtained through the search-space exploration by an Airline Crew Pairing Optimizer, AirCROP (developed by the authors and validated by the research consortium's industrial sponsor, GE Aviation). The resulting flight-connection predictions are combined on-the-fly using a novel heuristic to generate new pairings for the optimizer. The utility of the proposed approach is demonstrated on large-scale (over 4200 flights), real-world, complex flight-networks of US-based airlines, characterized by multiple hub-and-spoke subnetworks and several crew bases.
Adversarial Augmentation Policy Search for Domain and Cross-Lingual Generalization in Reading Comprehension
Maharana, Adyasha, Bansal, Mohit
Reading comprehension models often overfit to nuances of training datasets and fail at adversarial evaluation. Training with adversarially augmented dataset improves robustness against those adversarial attacks but hurts generalization of the models. In this work, we present several effective adversaries and automated data augmentation policy search methods with the goal of making reading comprehension models more robust to adversarial evaluation, but also improving generalization to the source domain as well as new domains and languages. We first propose three new methods for generating QA adversaries, that introduce multiple points of confusion within the context, show dependence on insertion location of the distractor, and reveal the compounding effect of mixing adversarial strategies with syntactic and semantic paraphrasing methods. Next, we find that augmenting the training datasets with uniformly sampled adversaries improves robustness to the adversarial attacks but leads to decline in performance on the original unaugmented dataset. We address this issue via RL and more efficient Bayesian policy search methods for automatically learning the best augmentation policy combinations of the transformation probability for each adversary in a large search space. Using these learned policies, we show that adversarial training can lead to significant improvements in in-domain, out-of-domain, and cross-lingual (German, Russian, Turkish) generalization without any use of training data from the target domain or language.
Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning
Parascandolo, Giambattista, Buesing, Lars, Merel, Josh, Hasenclever, Leonard, Aslanides, John, Hamrick, Jessica B., Heess, Nicolas, Neitz, Alexander, Weber, Theophane
Standard planners for sequential decision making (including Monte Carlo planning, tree search, dynamic programming, etc.) are constrained by an implicit sequential planning assumption: The order in which a plan is constructed is the same in which it is executed. We consider alternatives to this assumption for the class of goal-directed Reinforcement Learning (RL) problems. Instead of an environment transition model, we assume an imperfect, goal-directed policy. This low-level policy can be improved by a plan, consisting of an appropriate sequence of sub-goals that guide it from the start to the goal state. We propose a planning algorithm, Divide-and-Conquer Monte Carlo Tree Search (DC-MCTS), for approximating the optimal plan by means of proposing intermediate sub-goals which hierarchically partition the initial tasks into simpler ones that are then solved independently and recursively. The algorithm critically makes use of a learned sub-goal proposal for finding appropriate partitions trees of new tasks based on prior experience. Different strategies for learning sub-goal proposals give rise to different planning strategies that strictly generalize sequential planning. We show that this algorithmic flexibility over planning order leads to improved results in navigation tasks in grid-worlds as well as in challenging continuous control environments.