Since 2017, AI researchers have been using AI neural networks to help design better and faster AI neural networks. Applying AI in pursuit of better AI has, to date, been a largely academic pursuit--mainly because this approach requires tens of thousands of GPU hours. If that's what it takes, it's likely quicker and simpler to design real-world AI applications with the fallible guidance of educated guesswork. Next month, however, a team of MIT researchers will be presenting a so-called "neural architecture search" algorithm that can speed up the AI-optimized AI design process by 240 times or more. That would put faster and more accurate AI within practical reach for a broad class of image recognition algorithms and other related applications.
Efficient deep learning computing requires algorithm and hardware co-design to enable specialization: we usually need to change the algorithm to reduce memory footprint and improve energy efficiency. However, the extra degree of freedom from the algorithm makes the design space much larger: it's not only about designing the hardware but also about how to tweak the algorithm to best fit the hardware. Human engineers can hardly exhaust the design space by heuristics. It's labor consuming and sub-optimal. We propose design automation techniques for efficient neural networks. We investigate automatically designing specialized fast models, auto channel pruning, and auto mixed-precision quantization. We demonstrate such learning-based, automated design achieves superior performance and efficiency than rule-based human design. Moreover, we shorten the design cycle by 200x than previous work, so that we can afford to design specialized neural network models for different hardware platforms.
We introduce a new function-preserving transformation for efficient neural architecture search. This network transformation allows reusing previously trained networks and existing successful architectures that improves sample efficiency. We aim to address the limitation of current network transformation operations that can only perform layer-level architecture modifications, such as adding (pruning) filters or inserting (removing) a layer, which fails to change the topology of connection paths. Our proposed path-level transformation operations enable the meta-controller to modify the path topology of the given network while keeping the merits of reusing weights, and thus allow efficiently designing effective structures with complex path topologies like Inception models. We further propose a bidirectional tree-structured reinforcement learning meta-controller to explore a simple yet highly expressive tree-structured architecture space that can be viewed as a generalization of multi-branch architectures. We experimented on the image classification datasets with limited computational resources (about 200 GPU-hours), where we observed improved parameter efficiency and better test results (97.70% test accuracy on CIFAR-10 with 14.3M parameters and 74.6% top-1 accuracy on ImageNet in the mobile setting), demonstrating the effectiveness and transferability of our designed architectures.
Model-based compression is an effective, facilitating, and expanded model of neural network models with limited computing and low power. However, conventional models of compression techniques utilize crafted features [2,3,12] and explore specialized areas for exploration and design of large spaces in terms of size, speed, and accuracy, which usually have returns Less and time is up. This paper will effectively analyze deep auto compression (ADC) and reinforcement learning strength in an effective sample and space design, and improve the compression quality of the model. The results of compression of the advanced model are obtained without any human effort and in a completely automated way. With a 4- fold reduction in FLOP, the accuracy of 2.8% is higher than the manual compression model for VGG-16 in ImageNet.
Abstract--Convolutional Neural Networks (CNNs) are extremely computationally demanding, presenting a large barrier to their deployment on resource-constrained devices. Since such systems are where some of their most useful applications lie (e.g. In this paper we unify the two viewpoints in a Deep Learning Inference Stack and take an across-stack approach by implementing and evaluating the most common neural network compression techniques (weight pruning, channel pruning, and quantisation) and optimising their parallel execution with a range of programming approaches (OpenMP, OpenCL) and hardware architectures (CPU, GPU). We provide comprehensive Pareto curves to instruct tradeoffs under constraints of accuracy, execution time, and memory space. Recent years have yielded rapid advances in the field of deep learning, largely due to the unparalleled effectiveness of Convolutional Neural Networks (CNNs) on a variety of difficult problems . These networks are designed to run on servers with negligible resource constraints, utilising powerful GPUs. As such, creative approaches are required to deploy them on hardware with limited resources in order to enable many useful applications (e.g. However, currently these optimisation approaches come with limited benchmarks and few comparisons. We outline a first step towards a more comprehensive understanding of the performance available under different constraints of inference accuracy, execution time, and memory space. Since  used CNNs to outperform more traditional statistical techniques on the ImageNet dataset  they have become a standard tool for image processing. With a growing ecosystem dedicated to training deep neural networks, the number of parameters that state-of-the-art networks demand has vastly increased; in 2012 the state-of-the-art, AlexNet, had 61M parameters spread over eight layers whereas the most recent ImageNet winner uses an ensemble of SENets , the largest of which has 115M parameters across 154 layers.