Goto

Collaborating Authors

Bringing AI To Edge: From Deep Learning's Perspective

arXiv.org Artificial Intelligence

Edge computing and artificial intelligence (AI), especially deep learning for nowadays, are gradually intersecting to build a novel system, called edge intelligence. However, the development of edge intelligence systems encounters some challenges, and one of these challenges is the \textit{computational gap} between computation-intensive deep learning algorithms and less-capable edge systems. Due to the computational gap, many edge intelligence systems cannot meet the expected performance requirements. To bridge the gap, a plethora of deep learning techniques and optimization methods are proposed in the past years: light-weight deep learning models, network compression, and efficient neural architecture search. Although some reviews or surveys have partially covered this large body of literature, we lack a systematic and comprehensive review to discuss all aspects of these deep learning techniques which are critical for edge intelligence implementation. As various and diverse methods which are applicable to edge systems are proposed intensively, a holistic review would enable edge computing engineers and community to know the state-of-the-art deep learning techniques which are instrumental for edge intelligence and to facilitate the development of edge intelligence systems. This paper surveys the representative and latest deep learning techniques that are useful for edge intelligence systems, including hand-crafted models, model compression, hardware-aware neural architecture search and adaptive deep learning models. Finally, based on observations and simple experiments we conducted, we discuss some future directions.


A novel channel pruning method for deep neural network compression

arXiv.org Machine Learning

In recent years, deep neural networks have achieved great success in the field of computer vision. However, it is still a big challenge to deploy these deep models on resource-constrained embedded devices such as mobile robots, smart phones and so on. Therefore, network compression for such platforms is a reasonable solution to reduce memory consumption and computation complexity. In this paper, a novel channel pruning method based on genetic algorithm is proposed to compress very deep Convolution Neural Networks (CNNs). Firstly, a pre-trained CNN model is pruned layer by layer according to the sensitivity of each layer. After that, the pruned model is fine-tuned based on knowledge distillation framework. These two improvements significantly decrease the model redundancy with less accuracy drop. Channel selection is a combinatorial optimization problem that has exponential solution space. In order to accelerate the selection process, the proposed method formulates it as a search problem, which can be solved efficiently by genetic algorithm. Meanwhile, a two-step approximation fitness function is designed to further improve the efficiency of genetic process. The proposed method has been verified on three benchmark datasets with two popular CNN models: VGGNet and ResNet. On the CIFAR-100 and ImageNet datasets, our approach outperforms several state-of-the-art methods. On the CIFAR-10 and SVHN datasets, the pruned VGGNet achieves better performance than the original model with 8 times parameters compression and 3 times FLOPs reduction.


An Overview of Neural Network Compression

arXiv.org Machine Learning

Overparameterized networks trained to convergence have shown impressive performance in domains such as computer vision and natural language processing. Pushing state of the art on salient tasks within these domains corresponds to these models becoming larger and more difficult for machine learning practitioners to use given the increasing memory and storage requirements, not to mention the larger carbon footprint. Thus, in recent years there has been a resurgence in model compression techniques, particularly for deep convolutional neural networks and self-attention based networks such as the Transformer. Hence, this paper provides a timely overview of both old and current compression techniques for deep neural networks, including pruning, quantization, tensor decomposition, knowledge distillation and combinations thereof. We assume a basic familiarity with deep learning architectures\footnote{For an introduction to deep learning, see ~\citet{goodfellow2016deep}}, namely, Recurrent Neural Networks~\citep[(RNNs)][]{rumelhart1985learning,hochreiter1997long}, Convolutional Neural Networks~\citep{fukushima1980neocognitron}~\footnote{For an up to date overview see~\citet{khan2019survey}} and Self-Attention based networks~\citep{vaswani2017attention}\footnote{For a general overview of self-attention networks, see ~\citet{chaudhari2019attentive}.},\footnote{For more detail and their use in natural language processing, see~\citet{hu2019introductory}}. Most of the papers discussed are proposed in the context of at least one of these DNN architectures.


An Overview of Model Compression Techniques for Deep Learning in Space

#artificialintelligence

Every day we depend on extraterrestrial devices to send us information about the state of the Earth and surrounding space--currently, there are about 3,000 satellites orbiting the Earth and this number is growing rapidly. Processing and transmitting the wealth of data these devices produce is not a trivial task, given that resources in space such as on-board memory and downlink bandwidth face tight constraints. In the case of satellite images, the data at hand can be extremely large, sometimes as large as 8,000 8,000 pixels. For most practical applications, only part of the great amount of detail encoded in these images is of interest--such as the footprints of buildings, for example--but the current standard approach is to transmit the entire images back to Earth for processing. It seems a more efficient solution would be to process the data on board the spacecraft, arriving at a compressed representation that occupies fewer resources--something that could be achieved using a machine learning model. Unfortunately, running machine learning models tends to be a resource-intensive process even here on Earth. State-of-the-art networks typically consist of many millions of parameters and limited uplink bandwidth makes uploading such large networks to satellites impractical/infeasible.


Resource-Efficient Neural Networks for Embedded Systems

arXiv.org Machine Learning

While machine learning is traditionally a resource intensive task, embedded systems, autonomous navigation, and the vision of the Internet of Things fuel the interest in resource-efficient approaches. These approaches aim for a carefully chosen trade-off between performance and resource consumption in terms of computation and energy. The development of such approaches is among the major challenges in current machine learning research and key to ensure a smooth transition of machine learning technology from a scientific environment with virtually unlimited computing resources into every day's applications. In this article, we provide an overview of the current state of the art of machine learning techniques facilitating these real-world requirements. In particular, we focus on deep neural networks (DNNs), the predominant machine learning models of the past decade. We give a comprehensive overview of the vast literature that can be mainly split into three non-mutually exclusive categories: (i) quantized neural networks, (ii) network pruning, and (iii) structural efficiency. These techniques can be applied during training or as post-processing, and they are widely used to reduce the computational demands in terms of memory footprint, inference speed, and energy efficiency. We substantiate our discussion with experiments on well-known benchmark data sets to showcase the difficulty of finding good trade-offs between resource-efficiency and predictive performance.