Goto

Collaborating Authors

 Search


NAS-Bench-1Shot1: Benchmarking and Dissecting One-shot Neural Architecture Search

arXiv.org Machine Learning

A BSTRACT One-shot neural architecture search (NAS) has played a crucial role in making NAS methods computationally feasible in practice. Nevertheless, there is still a lack of understanding on how these weight-sharing algorithms exactly work due to the many factors controlling the dynamics of the process. In order to allow a scientific study of these components, we introduce a general framework for one-shot NAS that can be instantiated to many recently-introduced variants and introduce a general benchmarking framework that draws on the recent large-scale tabular benchmark NAS-Bench-101 for cheap anytime evaluations of one-shot NAS methods. The most crucial concept which led to a reduction in search costs to the order of a single function evaluation is certainly the weight-sharing paradigm: Training only a single large architecture (the one-shot model) subsuming all the possible architectures in the search space (Brock et al., 2018; Pham et al., 2018). Despite the great advancements of these methods, the exact results of many NAS papers are often hard to reproduce (Li & Talwalkar, 2019; Y u et al., 2020; Y ang et al., 2020). This is a result of several factors, such as unavailable original implementations, differences in the employed search spaces, training or evaluation pipelines, hyperparameter settings, and even pseudorandom number seeds (Lindauer & Hutter, 2019). One solution to guard against these problems would be a common library of NAS methods that provides primitives to construct different algorithm variants, similar to what as RLlib (Liang et al., 2017) offers for the field of reinforcement learning. Our paper makes a first step into this direction. Furthermore, experiments in NAS can be computationally extremely costly, making it virtually impossible to perform proper scientific evaluations with many repeated runs to draw statistically robust conclusions. To address this issue, Ying et al. (2019) introduced NAS-Bench-101, a large tabular benchmark with 423k unique cell architectures, trained and fully evaluated using a onetime extreme amount of compute power (several months on thousands of TPUs), which now allows to cheaply simulate an arbitrary number of runs of NAS methods, even on a laptop. NAS-Bench-101 enabled a comprehensive benchmarking of many discrete NAS optimizers (Zoph & Le, 2017; Real et al., 2019), using the exact same settings.


Search-Based Software Engineering for Self-Adaptive Systems: One Survey, Five Disappointments and Six Opportunities

#artificialintelligence

Search-Based Software Engineering (SBSE) is a promising paradigm that exploits computational search to optimize different processes when engineering complex software systems. Self-adaptive system (SAS) is one category of such complex systems that permits to optimize different functional and non-functional objectives/criteria under changing environment (e.g., requirements and workload), which involves problems that are subject to search. In this regard, over years, there have been a considerable amount of work that investigates SBSE for SASs. In this paper, we provide the first systematic and comprehensive survey exclusively on SBSE for SASs, covering 3,740 papers in 27 venues from 7 repositories, which eventually leads to several key statistics from the most notable 73 primary studies in this particular field of research. Our results, surprisingly, have revealed five disappointed issues that are of utmost importance, but have been overwhelmingly ignored in existing studies.


Saturated Cost Partitioning for Optimal Classical Planning

Journal of Artificial Intelligence Research

Cost partitioning is a method for admissibly combining a set of admissible heuristic estimators by distributing operator costs among the heuristics. Computing an optimal cost partitioning, i.e., the operator cost distribution that maximizes the heuristic value, is often prohibitively expensive to compute. Saturated cost partitioning is an alternative that is much faster to compute and has been shown to yield high-quality heuristics. However, its greedy nature makes it highly susceptible to the order in which the heuristics are considered. We propose a greedy algorithm to generate orders and show how to use hill-climbing search to optimize a given order. Combining both techniques leads to significantly better heuristic estimates than using the best random order that is generated in the same time. Since there is often no single order that gives good guidance on the whole state space, we use the maximum of multiple orders as a heuristic that is significantly better informed than any single-order heuristic, especially when we actively search for a set of diverse orders.


Adaptive Teaching of Temporal Logic Formulas to Learners with Preferences

arXiv.org Artificial Intelligence

Machine teaching is an algorithmic framework for teaching a target hypothesis via a sequence of examples or demonstrations. We investigate machine teaching for temporal logic formulas -- a novel and expressive hypothesis class amenable to time-related task specifications. In the context of teaching temporal logic formulas, an exhaustive search even for a myopic solution takes exponential time (with respect to the time span of the task). We propose an efficient approach for teaching parametric linear temporal logic formulas. Concretely, we derive a necessary condition for the minimal time length of a demonstration to eliminate a set of hypotheses. Utilizing this condition, we propose a myopic teaching algorithm by solving a sequence of integer programming problems. We further show that, under two notions of teaching complexity, the proposed algorithm has near-optimal performance. The results strictly generalize the previous results on teaching preference-based version space learners. We evaluate our algorithm extensively under a variety of learner types (i.e., learners with different preference models) and interactive protocols (e.g., batched and adaptive). The results show that the proposed algorithms can efficiently teach a given target temporal logic formula under various settings, and that there are significant gains of teaching efficacy when the teacher adapts to the learner's current hypotheses or uses oracles.


Learning the Hypotheses Space from data Part I: Learning Space and U-curve Property

arXiv.org Machine Learning

The agnostic PAC learning model consists of: a Hypothesis Space $\mathcal{H}$, a probability distribution $P$, a sample complexity function $m_{\mathcal{H}}(\epsilon,\delta): [0,1]^{2} \mapsto \mathbb{Z}_{+}$ of precision $\epsilon$ and confidence $1 - \delta$, a finite i.i.d. sample $\mathcal{D}_{N}$, a cost function $\ell$ and a learning algorithm $\mathbb{A}(\mathcal{H},\mathcal{D}_{N})$, which estimates $\hat{h} \in \mathcal{H}$ that approximates a target function $h^{\star} \in \mathcal{H}$ seeking to minimize out-of-sample error. In this model, prior information is represented by $\mathcal{H}$ and $\ell$, while problem solution is performed through their instantiation in several applied learning models, with specific algebraic structures for $\mathcal{H}$ and corresponding learning algorithms. However, these applied models use additional important concepts not covered by the classic PAC learning theory: model selection and regularization. This paper presents an extension of this model which covers these concepts. The main principle added is the selection, based solely on data, of a subspace of $\mathcal{H}$ with a VC-dimension compatible with the available sample. In order to formalize this principle, the concept of Learning Space $\mathbb{L}(\mathcal{H})$, which is a poset of subsets of $\mathcal{H}$ that covers $\mathcal{H}$ and satisfies a property regarding the VC dimension of related subspaces, is presented as the natural search space for model selection algorithms. A remarkable result obtained on this new framework are conditions on $\mathbb{L}(\mathcal{H})$ and $\ell$ that lead to estimated out-of-sample error surfaces, which are true U-curves on $\mathbb{L}(\mathcal{H})$ chains, enabling a more efficient search on $\mathbb{L}(\mathcal{H})$. Hence, in this new framework, the U-curve optimization problem becomes a natural component of model selection algorithms.


Bayesian optimization for backpropagation in Monte-Carlo tree search

arXiv.org Machine Learning

The robust nature of MCTS, versus a traditional approach like depth-first search in alpha-beta pruning, has not only enabled a leapfrog in performance in computer Go, but has also led to its utilization in other games where it is difficult to evaluate states, as well as in other domains (Browne et al., 2012). However, MCTS is known to suffer from slow convergence in certain situations (Coquelin and Munos, 2007), in particular when the precise calculation of a narrow tactical sequence is critical for success. For example in boardgames, (Ramanujan et al., 2010) defines a level-k search trap for player p after a move m as a state of the game where the opponent of p has a guaranteed k -move winning strategy . More relevantly, they show through a series of experiments that MCTS performs poorly even in shallow traps, in contrast to regular minimax search; see also (Ramanujan et al., 2011; Ramanujan and Sel-man, 2011). T o better understand this phenomenon, we take a closer look at the update rule Q n Q n 1 R n 1 Q n 1 n (1) which is performed during the backpropagation phase of MCTS. Here, the current estimate of the value of a state is taken to be the simple average of all previous returns accrued upon visiting that state. Proceeding, we discuss various methods which seek to improve backpropagation by challenging the basic assumptions implied by (1): (i) Value estimation by averaging returns: Instead of updating a parent node's value with that of its MAX (MIN) child as in minimax search, backpropagation in MCTS averages all returns to obtain a good signal in noisy environments (this is 1 arXiv:2001.09325v1


NLocalSAT: Boosting Local Search with Solution Prediction

arXiv.org Artificial Intelligence

The boolean satisfiability problem is a famous NP-complete problem in computer science. An effective way for this problem is the stochastic local search (SLS). However, in this method, the initialization is assigned in a random manner, which impacts the effectiveness of SLS solvers. To address this problem, we propose NLocalSAT. NLocalSAT combines SLS with a solution prediction model, which boosts SLS by changing initialization assignments with a neural network. We evaluated NLocalSAT on five SLS solvers (CCAnr, Sparrow, CPSparrow, YalSAT, and probSAT) with problems in the random track of SAT Competition 2018. The experimental results show that solvers with NLocalSAT achieve 27%~62% improvement over the original SLS solvers.


On the Performance of Metaheuristics: A Different Perspective

arXiv.org Artificial Intelligence

Nowadays, we are immersed in tens of newly-proposed evolutionary and swam-intelligence metaheuristics, which makes it very difficult to choose a proper one to be applied on a specific optimization problem at hand. On the other hand, most of these metaheuristics are nothing but slightly modified variants of the basic metaheuristics. For example, Differential Evolution (DE) or Shuffled Frog Leaping (SFL) are just Genetic Algorithms (GA) with a specialized operator or an extra local search, respectively. Therefore, what comes to the mind is whether the behavior of such newly-proposed metaheuristics can be investigated on the basis of studying the specifications and characteristics of their ancestors. In this paper, a comprehensive evaluation study on some basic metaheuristics i.e. Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Artificial Bee Colony (ABC), Teaching-Learning-Based Optimization (TLBO), and Cuckoo Optimization algorithm (COA) is conducted, which give us a deeper insight into the performance of them so that we will be able to better estimate the performance and applicability of all other variations originated from them. A large number of experiments have been conducted on 20 different combinatorial optimization benchmark functions with different characteristics, and the results reveal to us some fundamental conclusions besides the following ranking order among these metaheuristics, {ABC, PSO, TLBO, GA, COA} i.e. ABC and COA are the best and the worst methods from the performance point of view, respectively. In addition, from the convergence perspective, PSO and ABC have significant better convergence for unimodal and multimodal functions, respectively, while GA and COA have premature convergence to local optima in many cases needing alternative mutation mechanisms to enhance diversification and global search.


Stacked Auto Encoder Based Deep Reinforcement Learning for Online Resource Scheduling in Large-Scale MEC Networks

arXiv.org Machine Learning

An online resource scheduling framework is proposed for minimizing the sum of weighted task latency for all the mobile users, by optimizing offloading decision, transmission power, and resource allocation in the mobile edge computing (MEC) system. Towards this end, a deep reinforcement learning (DRL) method is proposed to obtain an online resource scheduling policy. Firstly, a related and regularized stacked auto encoder (2r-SAE) with unsupervised learning is proposed to perform data compression and representation for high dimensional channel quality information (CQI) data, which can reduce the state space for DRL. Secondly, we present an adaptive simulated annealing based approach (ASA) as the action search method of DRL, in which an adaptive h-mutation is used to guide the search direction and an adaptive iteration is proposed to enhance the search efficiency during the DRL process. Thirdly, a preserved and prioritized experience replay (2p-ER) is introduced to assist the DRL to train the policy network and find the optimal offloading policy. Numerical results are provided to demonstrate that the proposed algorithm can achieve near-optimal performance while significantly decreasing the computational time compared with existing benchmarks. It also shows that the proposed framework is suitable for resource scheduling problem in large-scale MEC networks, especially in the dynamic environment.


Best Principal Submatrix Selection for the Maximum Entropy Sampling Problem: Scalable Algorithms and Performance Guarantees

arXiv.org Machine Learning

This paper studies a classic maximum entropy sampling problem (MESP), which aims to select the most informative principal submatrix of a prespecified size from a covariance matrix. MESP has been widely applied to many areas, including healthcare, power system, manufacturing and data science. By investigating its Lagrangian dual and primal characterization, we derive a novel convex integer program for MESP and show that its continuous relaxation yields a near-optimal solution. The results motivate us to study an efficient sampling algorithm and develop its approximation bound for MESP, which improves the best-known bound in literature. We then provide an efficient deterministic implementation of the sampling algorithm with the same approximation bound. By developing new mathematical tools for the singular matrices and analyzing the Lagrangian dual of the proposed convex integer program, we investigate the widely-used local search algorithm and prove its first-known approximation bound for MESP. The proof techniques further inspire us with an efficient implementation of the local search algorithm. Our numerical experiments demonstrate that these approximation algorithms can efficiently solve medium-sized and large-scale instances to near-optimality. Our proposed algorithms are coded and released as open-source software. Finally, we extend the analyses to the A-Optimal MESP (A-MESP), where the objective is to minimize the trace of the inverse of the selected principal submatrix.