Kicking neural network design automation into high gear

MIT News

A new area in artificial intelligence involves using algorithms to automatically design machine-learning systems known as neural networks, which are more accurate and efficient than those developed by human engineers. But this so-called neural architecture search (NAS) technique is computationally expensive. A state-of-the-art NAS algorithm recently developed by Google to run on a squad of graphical processing units (GPUs) took 48,000 GPU hours to produce a single convolutional neural network, which is used for image classification and detection tasks. Google has the wherewithal to run hundreds of GPUs and other specialized hardware in parallel, but that's out of reach for many others. In a paper being presented at the International Conference on Learning Representations in May, MIT researchers describe an NAS algorithm that can directly learn specialized convolutional neural networks (CNNs) for target hardware platforms -- when run on a massive image dataset -- in only 200 GPU hours, which could enable far broader use of these types of algorithms.


Auto Deep Compression by Reinforcement Learning Based Actor-Critic Structure

arXiv.org Machine Learning

Model-based compression is an effective, facilitating, and expanded model of neural network models with limited computing and low power. However, conventional models of compression techniques utilize crafted features [2,3,12] and explore specialized areas for exploration and design of large spaces in terms of size, speed, and accuracy, which usually have returns Less and time is up. This paper will effectively analyze deep auto compression (ADC) and reinforcement learning strength in an effective sample and space design, and improve the compression quality of the model. The results of compression of the advanced model are obtained without any human effort and in a completely automated way. With a 4- fold reduction in FLOP, the accuracy of 2.8% is higher than the manual compression model for VGG-16 in ImageNet.


Automatic Configuration of Deep Neural Networks with EGO

arXiv.org Machine Learning

Designing the architecture for an artificial neural network is a cumbersome task because of the numerous parameters to configure, including activation functions, layer types, and hyper-parameters. With the large number of parameters for most networks nowadays, it is intractable to find a good configuration for a given task by hand. In this paper an Efficient Global Optimization (EGO) algorithm is adapted to automatically optimize and configure convolutional neural network architectures. A configurable neural network architecture based solely on convolutional layers is proposed for the optimization. Without using any knowledge on the target problem and not using any data augmentation techniques, it is shown that on several image classification tasks this approach is able to find competitive network architectures in terms of prediction accuracy, compared to the best hand-crafted ones in literature. In addition, a very small training budget (200 evaluations and 10 epochs in training) is spent on each optimized architectures in contrast to the usual long training time of hand-crafted networks. Moreover, instead of the standard sequential evaluation in EGO, several candidate architectures are proposed and evaluated in parallel, which saves the execution overheads significantly and leads to an efficient automation for deep neural network design.


Finding Competitive Network Architectures Within a Day Using UCT

arXiv.org Machine Learning

The design of neural network architectures for a new data set is a laborious task which requires human deep learning expertise. In order to make deep learning available for a broader audience, automated methods for finding a neural network architecture are vital. Recently proposed methods can already achieve human expert level performances. However, these methods have run times of months or even years of GPU computing time, ignoring hardware constraints as faced by many researchers and companies. We propose the use of Monte Carlo planning in combination with two different UCT (upper confidence bound applied to trees) derivations to search for network architectures. We adapt the UCT algorithm to the needs of network architecture search by proposing two ways of sharing information between different branches of the search tree. In an empirical study we are able to demonstrate that this method is able to find competitive networks for MNIST, SVHN and CIFAR-10 in just a single GPU day. Extending the search time to five GPU days, we are able to outperform human architectures and our competitors which consider the same types of layers.


A Review of Meta-Reinforcement Learning for Deep Neural Networks Architecture Search

arXiv.org Artificial Intelligence

The plain network design is generally kept as a first step of proposed approaches application ([28], [42]) given that it leads to simple networks and allows to focus on the method itself before switching to more complex structures with modular design ([29], [49]). A third option used in design approaches at a lower scale is the prediction of explored architectures rewards before full training the most promising ones ([25], [29]). This training acceleration technique is implemented for performance improvement purpose and requires further attention to control possible bias impact on the models behavior. The success of current reinforcement-learning-based approaches to design CNN architectures is widely proven especially for image classification tasks. However, it is achieved at the cost of high computational resources despite the acceleration attempts of most of recent models. Such fact is preventing individual researchersand small research entities (companies and laboratories) from fully access to this innovative technology [42]. Hence, deeper and more revolutionary optimizing methods are required to practically operate CNN automatic design. Transformation approaches based on extended network morphisms [49] are among the first attempts in this direction that achieved drastic decrease in computational cost and demonstrated generalization capacity. Additionalfuture directions to control automatic design complexity is to develop methods for multi-task problems [58] and weights sharing [59] in order to benefit from knowledge transfer contributions.