Developing neural network image classification models often requires significant architecture engineering. In this paper, we attempt to automate this engineering process by learning the model architectures directly on the dataset of interest. As this approach is expensive when the dataset is large, we propose to search for an architectural building block on a small dataset and then transfer the block to a larger dataset. Our key contribution is the design of a new search space which enables transferability. In our experiments, we search for the best convolutional layer (or "cell") on the CIFAR-10 dataset and then apply this cell to the ImageNet dataset by stacking together more copies of this cell, each with their own parameters. Although the cell is not searched for directly on ImageNet, an architecture constructed from the best cell achieves, among the published works, state-of-the-art accuracy of 82.7% top-1 and 96.2% top-5 on ImageNet. Our model is 1.2% better in top-1 accuracy than the best human-invented architectures while having 9 billion fewer FLOPS -- a reduction of 28% in computational demand from the previous state-of-the-art model. When evaluated at different levels of computational cost, accuracies of our models exceed those of the state-of-the-art human-designed models. For instance, a smaller network constructed from the best cell also achieves 74% top-1 accuracy, which is 3.1% better than equivalently-sized, state-of-the-art models for mobile platforms. On CIFAR-10, an architecture constructed from the best cell achieves 2.4% error rate, which is also state-of-the-art. Finally, the image features learned from image classification can also be transferred to other computer vision problems. On the task of object detection, the learned features used with the Faster-RCNN framework surpass state-of-the-art by 4.0% achieving 43.1% mAP on the COCO dataset.
Cai, Han (Shanghai Jiao Tong University) | Chen, Tianyao (Shanghai Jiao Tong University) | Zhang, Weinan (Shanghai Jiao Tong University) | Yu, Yong (Shanghai Jiao Tong University) | Wang, Jun (University College London)
Techniques for automatically designing deep neural network architectures such as reinforcement learning based approaches have recently shown promising results. However, their success is based on vast computational resources (e.g. hundreds of GPUs), making them difficult to be widely used. A noticeable limitation is that they still design and train each network from scratch during the exploration of the architecture space, which is highly inefficient. In this paper, we propose a new framework toward efficient architecture search by exploring the architecture space based on the current network and reusing its weights. We employ a reinforcement learning agent as the meta-controller, whose action is to grow the network depth or layer width with function-preserving transformations. As such, the previously validated networks can be reused for further exploration, thus saves a large amount of computational cost. We apply our method to explore the architecture space of the plain convolutional neural networks (no skip-connections, branching etc.) on image benchmark datasets (CIFAR-10, SVHN) with restricted computational resources (5 GPUs). Our method can design highly competitive networks that outperform existing networks using the same design scheme. On CIFAR-10, our model without skip-connections achieves 4.23% test error rate, exceeding a vast majority of modern architectures and approaching DenseNet. Furthermore, by applying our method to explore the DenseNet architecture space, we are able to achieve more accurate networks with fewer parameters.
The plain network design is generally kept as a first step of proposed approaches application (, ) given that it leads to simple networks and allows to focus on the method itself before switching to more complex structures with modular design (, ). A third option used in design approaches at a lower scale is the prediction of explored architectures rewards before full training the most promising ones (, ). This training acceleration technique is implemented for performance improvement purpose and requires further attention to control possible bias impact on the models behavior. The success of current reinforcement-learning-based approaches to design CNN architectures is widely proven especially for image classification tasks. However, it is achieved at the cost of high computational resources despite the acceleration attempts of most of recent models. Such fact is preventing individual researchersand small research entities (companies and laboratories) from fully access to this innovative technology . Hence, deeper and more revolutionary optimizing methods are required to practically operate CNN automatic design. Transformation approaches based on extended network morphisms  are among the first attempts in this direction that achieved drastic decrease in computational cost and demonstrated generalization capacity. Additionalfuture directions to control automatic design complexity is to develop methods for multi-task problems  and weights sharing  in order to benefit from knowledge transfer contributions.
Latest algorithms for automatic neural architecture search perform remarkable but are basically directionless in search space and computational expensive in training of every intermediate architecture. In this paper, we propose a method for efficient architecture search called EENA (Efficient Evolution of Neural Architecture). Due to the elaborately designed mutation and crossover operations, the evolution process can be guided by the information have already been learned. Therefore, less computational effort will be required while the searching and training time can be reduced significantly. On CIFAR-10 classification, EENA using minimal computational resources (0.65 GPU days) can design highly effective neural architecture which achieves 2.56% test error with 8.47M parameters. Furthermore, the best architecture discovered is also transferable for CIFAR-100.
Abstract--Neural network architectures found by sophistic search algorithms achieve strikingly good test performance, surpassing most human-crafted network models by significant margins. Although computationally efficient, their design is often very complex, impairing execution speed. Additionally, finding models outside of the search space is not possible by design. While our space is still limited, we implement undiscoverable expert knowledge into the economic search algorithm Efficient Neural Architecture Search (ENAS), guided by the design principles and architecture of ShuffleNet V2. While maintaining baselinelike 2.85%test error on CIFAR-10, our ShuffleNASNets are significantly less complex, require fewer parameters, and are two times faster than the ENAS baseline in a classification task.