Search
sharpDARTS: Faster and More Accurate Differentiable Architecture Search
Hundt, Andrew, Jain, Varun, Hager, Gregory D.
Neural Architecture Search (NAS) has been a source of dramatic improvements in neural network design, with recent results meeting or exceeding the performance of hand-tuned architectures. However, our understanding of how to represent the search space for neural net architectures and how to search that space efficiently are both still in their infancy. We have performed an in-depth analysis to identify limitations in a widely used search space and a recent architecture search method, Differentiable Architecture Search (DARTS). These findings led us to introduce novel network blocks with a more general, balanced, and consistent design; a better-optimized Cosine Power Annealing learning rate schedule; and other improvements. Our resulting sharpDARTS search is 50% faster with a 20-30% relative improvement in final model error on CIFAR-10 when compared to DARTS. Our best single model run has 1.93% (1.98+/-0.07) validation error on CIFAR-10 and 5.5% error (5.8+/-0.3) on the recently released CIFAR-10.1 test set. To our knowledge, both are state of the art for models of similar size. This model also generalizes competitively to ImageNet at 25.1% top-1 (7.8% top-5) error. We found improvements for existing search spaces but does DARTS generalize to new domains? We propose Differentiable Hyperparameter Grid Search and the HyperCuboid search space, which are representations designed to leverage DARTS for more general parameter optimization. Here we find that DARTS fails to generalize when compared against a human's one shot choice of models. We look back to the DARTS and sharpDARTS search spaces to understand why, and an ablation study reveals an unusual generalization gap. We finally propose Max-W regularization to solve this problem, which proves significantly better than the handmade design. Code will be made available.
Improving Search with Supervised Learning in Trick-Based Card Games
Solinas, Christopher, Rebstock, Douglas, Buro, Michael
In trick-taking card games, a two-step process of state sampling and evaluation is widely used to approximate move values. While the evaluation component is vital, the accuracy of move value estimates is also fundamentally linked to how well the sampling distribution corresponds the true distribution. Despite this, recent work in trick-taking card game AI has mainly focused on improving evaluation algorithms with limited work on improving sampling. In this paper, we focus on the effect of sampling on the strength of a player and propose a novel method of sampling more realistic states given move history. In particular, we use predictions about locations of individual cards made by a deep neural network --- trained on data from human gameplay - in order to sample likely worlds for evaluation. This technique, used in conjunction with Perfect Information Monte Carlo (PIMC) search, provides a substantial increase in cardplay strength in the popular trick-taking card game of Skat.
Exploiting Promising Sub-Sequences of Jobs to solve the No-Wait Flowshop Scheduling Problem
Mousin, Lucien, Kessaci, Marie-Elรฉonore, Dhaenens, Clarisse
The no-wait flowshop scheduling problem is a variant of the classical permutation flowshop problem, with the additional constraint that jobs have to be processed by the successive machines without waiting time. To efficiently address this NP-hard combinatorial optimization problem we conduct an analysis of the structure of good quality solutions. This analysis shows that the No-Wait specificity gives them a common structure: they share identical sub-sequences of jobs, we call super-jobs. After a discussion on the way to identify these super-jobs, we propose IG-SJ, an algorithm that exploits super-jobs within the state-of-the-art algorithm for the classical permutation flowshop, the well-known Iterated Greedy (IG) algorithm. An iterative approach of IG-SJ is also proposed. Experiments are conducted on Taillard's instances. The experimental results show that exploiting super-jobs is successful since IG-SJ is able to find 64 new best solutions.
Biasing MCTS with Features for General Games
Soemers, Dennis J. N. J., Piette, รric, Browne, Cameron
This paper proposes using a linear function approximator, rather than a deep neural network (DNN), to bias a Monte Carlo tree search (MCTS) player for general games. This is unlikely to match the potential raw playing strength of DNNs, but has advantages in terms of generality, interpretability and resources (time and hardware) required for training. Features describing local patterns are used as inputs. The features are formulated in such a way that they are easily interpretable and applicable to a wide range of general games, and might encode simple local strategies. We gradually create new features during the same self-play training process used to learn feature weights. We evaluate the playing strength of an MCTS player biased by learnt features against a standard upper confidence bounds for trees (UCT) player in multiple different board games, and demonstrate significantly improved playing strength in the majority of them after a small number of self-play training games.
Efficient Search-Based Weighted Model Integration
Zeng, Zhe, Broeck, Guy Van den
Weighted model integration (WMI) extends Weighted model counting (WMC) to the integration of functions over mixed discrete-continuous domains. It has shown tremendous promise for solving inference problems in graphical models and probabilistic programming. Yet, state-of-the-art tools for WMI are limited in terms of performance and ignore the independence structure that is crucial to improving efficiency. To address this limitation, we propose an efficient model integration algorithm for theories with tree primal graphs. We exploit the sparse graph structure by using search to performing integration. Our algorithm greatly improves the computational efficiency on such problems and exploits context-specific independence between variables. Experimental results show dramatic speedups compared to existing WMI solvers on problems with tree-shaped dependencies.
Learning Mixtures of Separable Dictionaries for Tensor Data: Analysis and Algorithms
Ghassemi, Mohsen, Shakeri, Zahra, Sarwate, Anand D., Bajwa, Waheed U.
This work addresses the problem of learning sparse representations of tensor data using structured dictionary learning. It proposes learning a mixture of separable dictionaries to better capture the structure of tensor data by generalizing the separable dictionary learning model. Two different approaches for learning mixture of separable dictionaries are explored and sufficient conditions for local identifiability of the underlying dictionary are derived in each case. Moreover, computational algorithms are developed to solve the problem of learning mixture of separable dictionaries in both batch and online settings. Numerical experiments are used to show the usefulness of the proposed model and the efficacy of the developed algorithms.
Tuning Hyperparameters without Grad Students: Scalable and Robust Bayesian Optimisation with Dragonfly
Kandasamy, Kirthevasan, Vysyaraju, Karun Raju, Neiswanger, Willie, Paria, Biswajit, Collins, Christopher R., Schneider, Jeff, Poczos, Barnabas, Xing, Eric P.
Bayesian Optimisation (BO), refers to a suite of techniques for global optimisation of expensive black box functions, which use introspective Bayesian models of the function to efficiently find the optimum. While BO has been applied successfully in many applications, modern optimisation tasks usher in new challenges where conventional methods fail spectacularly. In this work, we present Dragonfly, an open source Python library for scalable and robust BO. Dragonfly incorporates multiple recently developed methods that allow BO to be applied in challenging real world settings; these include better methods for handling higher dimensional domains, methods for handling multi-fidelity evaluations when cheap approximations of an expensive function are available, methods for optimising over structured combinatorial spaces, such as the space of neural network architectures, and methods for handling parallel evaluations. Additionally, we develop new methodological improvements in BO for selecting the Bayesian model, selecting the acquisition function, and optimising over complex domains with different variable types and additional constraints. We compare Dragonfly to a suite of other packages and algorithms for global optimisation and demonstrate that when the above methods are integrated, they enable significant improvements in the performance of BO. The Dragonfly library is available at dragonfly.github.io.
Distributed Gibbs: A Linear-Space Sampling-Based DCOP Algorithm
Nguyen, Duc Thien, Yeoh, William, Lau, Hoong Chuin, Zivan, Roie
Researchers have used distributed constraint optimization problems (DCOPs) to model various multi-agent coordination and resource allocation problems. Very recently, Ottens et al. proposed a promising new approach to solve DCOPs that is based on confidence bounds via their Distributed UCT (DUCT) sampling-based algorithm. Unfortunately, its memory requirement per agent is exponential in the number of agents in the problem, which prohibits it from scaling up to large problems. Thus, in this article, we introduce two new sampling-based DCOP algorithms called Sequential Distributed Gibbs (SD-Gibbs) and Parallel Distributed Gibbs (PD-Gibbs). Both algorithms have memory requirements per agent that is linear in the number of agents in the problem. Our empirical results show that our algorithms can find solutions that are better than DUCT, run faster than DUCT, and solve some large problems that DUCT failed to solve due to memory limitations.
Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement
Kool, Wouter, van Hoof, Herke, Welling, Max
The well-known Gumbel-Max trick for sampling from a categorical distribution can be extended to sample $k$ elements without replacement. We show how to implicitly apply this 'Gumbel-Top-$k$' trick on a factorized distribution over sequences, allowing to draw exact samples without replacement using a Stochastic Beam Search. Even for exponentially large domains, the number of model evaluations grows only linear in $k$ and the maximum sampled sequence length. The algorithm creates a theoretical connection between sampling and (deterministic) beam search and can be used as a principled intermediate alternative. In a translation task, the proposed method compares favourably against alternatives to obtain diverse yet good quality translations. We show that sequences sampled without replacement can be used to construct low-variance estimators for expected sentence-level BLEU score and model entropy.
Generating and Sampling Orbits for Lifted Probabilistic Inference
Holtzen, Steven, Millstein, Todd, Broeck, Guy Van den
Lifted inference scales to large probability models by exploiting symmetry. However, existing exact lifted inference techniques do not apply to general factor graphs, as they require a relational representation. In this work we provide a theoretical framework and algorithm for performing exact lifted inference on symmetric factor graphs by computing colored graph automorphisms, as is often done for approximate lifted inference. Our key insight is to represent variable assignments directly in the colored factor graph encoding. This allows us to generate representatives and compute the size of each orbit of the symmetric distribution. In addition to exact inference, we use this encoding to implement an MCMC algorithm that explores the space of orbits quickly by uniform orbit sampling.