A novel channel pruning method for deep neural network compression

arXiv.org Machine Learning

In recent years, deep neural networks have achieved great success in the field of computer vision. However, it is still a big challenge to deploy these deep models on resource-constrained embedded devices such as mobile robots, smart phones and so on. Therefore, network compression for such platforms is a reasonable solution to reduce memory consumption and computation complexity. In this paper, a novel channel pruning method based on genetic algorithm is proposed to compress very deep Convolution Neural Networks (CNNs). Firstly, a pre-trained CNN model is pruned layer by layer according to the sensitivity of each layer. After that, the pruned model is fine-tuned based on knowledge distillation framework. These two improvements significantly decrease the model redundancy with less accuracy drop. Channel selection is a combinatorial optimization problem that has exponential solution space. In order to accelerate the selection process, the proposed method formulates it as a search problem, which can be solved efficiently by genetic algorithm. Meanwhile, a two-step approximation fitness function is designed to further improve the efficiency of genetic process. The proposed method has been verified on three benchmark datasets with two popular CNN models: VGGNet and ResNet. On the CIFAR-100 and ImageNet datasets, our approach outperforms several state-of-the-art methods. On the CIFAR-10 and SVHN datasets, the pruned VGGNet achieves better performance than the original model with 8 times parameters compression and 3 times FLOPs reduction.

N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning

arXiv.org Machine Learning

While bigger and deeper neural network architectures continue to advance the state-of-the-art for many computer vision tasks, real-world adoption of these networks is impeded by hardware and speed constraints. Conventional model compression methods attempt to address this problem by modifying the architecture manually or using pre-defined heuristics. Since the space of all reduced architectures is very large, modifying the architecture of a deep neural network in this way is a difficult task. In this paper, we tackle this issue by introducing a principled method for learning reduced network architectures in a data-driven way using reinforcement learning. Our approach takes a larger `teacher' network as input and outputs a compressed `student' network derived from the `teacher' network. In the first stage of our method, a recurrent policy network aggressively removes layers from the large `teacher' model. In the second stage, another recurrent policy network carefully reduces the size of each remaining layer. The resulting network is then evaluated to obtain a reward -- a score based on the accuracy and compression of the network. Our approach uses this reward signal with policy gradients to train the policies to find a locally optimal student network. Our experiments show that we can achieve compression rates of more than 10x for models such as ResNet-34 while maintaining similar performance to the input `teacher' network. We also present a valuable transfer learning result which shows that policies which are pre-trained on smaller `teacher' networks can be used to rapidly speed up training on larger `teacher' networks.

Characterising Across-Stack Optimisations for Deep Convolutional Neural Networks

arXiv.org Machine Learning

Abstract--Convolutional Neural Networks (CNNs) are extremely computationally demanding, presenting a large barrier to their deployment on resource-constrained devices. Since such systems are where some of their most useful applications lie (e.g. In this paper we unify the two viewpoints in a Deep Learning Inference Stack and take an across-stack approach by implementing and evaluating the most common neural network compression techniques (weight pruning, channel pruning, and quantisation) and optimising their parallel execution with a range of programming approaches (OpenMP, OpenCL) and hardware architectures (CPU, GPU). We provide comprehensive Pareto curves to instruct tradeoffs under constraints of accuracy, execution time, and memory space. Recent years have yielded rapid advances in the field of deep learning, largely due to the unparalleled effectiveness of Convolutional Neural Networks (CNNs) on a variety of difficult problems [1]. These networks are designed to run on servers with negligible resource constraints, utilising powerful GPUs. As such, creative approaches are required to deploy them on hardware with limited resources in order to enable many useful applications (e.g. However, currently these optimisation approaches come with limited benchmarks and few comparisons. We outline a first step towards a more comprehensive understanding of the performance available under different constraints of inference accuracy, execution time, and memory space. Since [7] used CNNs to outperform more traditional statistical techniques on the ImageNet dataset [8] they have become a standard tool for image processing. With a growing ecosystem dedicated to training deep neural networks, the number of parameters that state-of-the-art networks demand has vastly increased; in 2012 the state-of-the-art, AlexNet, had 61M parameters spread over eight layers whereas the most recent ImageNet winner uses an ensemble of SENets [9], the largest of which has 115M parameters across 154 layers.

Network Slimming by Slimmable Networks: Towards One-Shot Architecture Search for Channel Numbers

arXiv.org Artificial Intelligence

We study how to set channel numbers in a neural network to achieve better accuracy under constrained resources (e.g., FLOPs, latency, memory footprint or model size). A simple and one-shot solution, named AutoSlim, is presented. Instead of training many network samples and searching with reinforcement learning, we train a single slimmable network to approximate the network accuracy of different channel configurations. We then iteratively evaluate the trained slimmable model and greedily slim the layer with minimal accuracy drop. By this single pass, we can obtain the optimized channel configurations under different resource constraints. We present experiments with MobileNet v1, MobileNet v2, ResNet-50 and RL-searched MNasNet on ImageNet classification. We show significant improvements over their default channel configurations. We also achieve better accuracy than recent channel pruning methods and neural architecture search methods. Notably, by setting optimized channel numbers, our AutoSlim-MobileNet-v2 at 305M FLOPs achieves 74.2% top-1 accuracy, 2.4% better than default MobileNet-v2 (301M FLOPs), and even 0.2% better than RL-searched MNasNet (317M FLOPs). Our AutoSlim-ResNet-50 at 570M FLOPs, without depthwise convolutions, achieves 1.3% better accuracy than MobileNet-v1 (569M FLOPs). Code and models will be available at: https://github.com/JiahuiYu/slimmable_networks

Design Automation for Efficient Deep Learning Computing

arXiv.org Machine Learning

Efficient deep learning computing requires algorithm and hardware co-design to enable specialization: we usually need to change the algorithm to reduce memory footprint and improve energy efficiency. However, the extra degree of freedom from the algorithm makes the design space much larger: it's not only about designing the hardware but also about how to tweak the algorithm to best fit the hardware. Human engineers can hardly exhaust the design space by heuristics. It's labor consuming and sub-optimal. We propose design automation techniques for efficient neural networks. We investigate automatically designing specialized fast models, auto channel pruning, and auto mixed-precision quantization. We demonstrate such learning-based, automated design achieves superior performance and efficiency than rule-based human design. Moreover, we shorten the design cycle by 200x than previous work, so that we can afford to design specialized neural network models for different hardware platforms.