InstaNAS: Instance-aware Neural Architecture Search

arXiv.org Machine Learning

Neural Architecture Search (NAS) aims at finding one "single" architecture that achieves the best accuracy for a given task such as image recognition.In this paper, we study the instance-level variation,and demonstrate that instance-awareness is an important yet currently missing component of NAS. Based on this observation, we propose InstaNAS for searching toward instance-level architectures;the controller is trained to search and form a "distribution of architectures" instead of a single final architecture. Then during the inference phase, the controller selects an architecture from the distribution, tailored for each unseen image to achieve both high accuracy and short latency. The experimental results show that InstaNAS reduces the inference latency without compromising classification accuracy. On average, InstaNAS achieves 48.9% latency reduction on CIFAR-10 and 40.2% latency reduction on CIFAR-100 with respect to MobileNetV2 architecture.


ShuffleNASNets: Efficient CNN models through modified Efficient Neural Architecture Search

arXiv.org Machine Learning

Abstract--Neural network architectures found by sophistic search algorithms achieve strikingly good test performance, surpassing most human-crafted network models by significant margins. Although computationally efficient, their design is often very complex, impairing execution speed. Additionally, finding models outside of the search space is not possible by design. While our space is still limited, we implement undiscoverable expert knowledge into the economic search algorithm Efficient Neural Architecture Search (ENAS), guided by the design principles and architecture of ShuffleNet V2. While maintaining baselinelike 2.85%test error on CIFAR-10, our ShuffleNASNets are significantly less complex, require fewer parameters, and are two times faster than the ENAS baseline in a classification task.


Neural Architecture Search with Reinforcement Learning

arXiv.org Artificial Intelligence

Neural networks are powerful and flexible models that work well for many difficult learning tasks in image, speech and natural language understanding. Despite their success, neural networks are still hard to design. In this paper, we use a recurrent network to generate the model descriptions of neural networks and train this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set. On the CIFAR-10 dataset, our method, starting from scratch, can design a novel network architecture that rivals the best human-invented architecture in terms of test set accuracy. Our CIFAR-10 model achieves a test error rate of 3.65, which is 0.09 percent better and 1.05x faster than the previous state-of-the-art model that used a similar architectural scheme. On the Penn Treebank dataset, our model can compose a novel recurrent cell that outperforms the widely-used LSTM cell, and other state-of-the-art baselines. Our cell achieves a test set perplexity of 62.4 on the Penn Treebank, which is 3.6 perplexity better than the previous state-of-the-art model. The cell can also be transferred to the character language modeling task on PTB and achieves a state-of-the-art perplexity of 1.214.


Neural Architecture Search Over a Graph Search Space

arXiv.org Machine Learning

Neural architecture search (NAS) enabled the discovery of state-of-the-art architectures in many domains. However, the success of NAS depends on the definition of the search space, i.e. the set of the possible to generate neural architectures. State-of-the-art search spaces are defined as a static sequence of decisions and a set of available actions for each decision, where each possible sequence of actions defines an architecture. We propose a more expressive formulation of NAS, using a graph search space. Our search space is defined as a graph where each decision is a vertex and each action is an edge. Thus the sequence of decisions defining an architecture is not fixed but is determined dynamically by the actions selected. The proposed approach allows to model iterative and branching aspects of the architecture design process. In this form, stronger priors about the search can be induced. We demonstrate in simulation basic iterative and branching search structures and show that using the graph representation improves sample efficiency.


Transfer Learning to Learn with Multitask Neural Model Search

arXiv.org Machine Learning

Deep learning models require extensive architecture design exploration and hyperparameter optimization to perform well on a given task. The exploration of the model design space is often made by a human expert, and optimized using a combination of grid search and search heuristics over a large space of possible choices. Neural Architecture Search (NAS) is a Reinforcement Learning approach that has been proposed to automate architecture design. NAS has been successfully applied to generate Neural Networks that rival the best human-designed architectures. However, NAS requires sampling, constructing, and training hundreds to thousands of models to achieve well-performing architectures. This procedure needs to be executed from scratch for each new task. The application of NAS to a wide set of tasks currently lacks a way to transfer generalizable knowledge across tasks. In this paper, we present the Multitask Neural Model Search (MNMS) controller. Our goal is to learn a generalizable framework that can condition model construction on successful model searches for previously seen tasks, thus significantly speeding up the search for new tasks. We demonstrate that MNMS can conduct an automated architecture search for multiple tasks simultaneously while still learning well-performing, specialized models for each task. We then show that pre-trained MNMS controllers can transfer learning to new tasks. By leveraging knowledge from previous searches, we find that pre-trained MNMS models start from a better location in the search space and reduce search time on unseen tasks, while still discovering models that outperform published human-designed models.