T-Net: Parametrizing Fully Convolutional Nets with a Single High-Order Tensor

arXiv.org Artificial Intelligence

Recent findings indicate that over-parametrization, while crucial for successfully training deep neural networks, also introduces large amounts of redundancy. Tensor methods have the potential to efficiently parametrize over-complete representations by leveraging this redundancy. In this paper, we propose to fully parametrize Convolutional Neural Networks (CNNs) with a single high-order, low-rank tensor. Previous works on network tensorization have focused on parametrizing individual layers (convolutional or fully connected) only, and perform the tensorization layer-by-layer separately. In contrast, we propose to jointly capture the full structure of a neural network by parametrizing it with a single high-order tensor, the modes of which represent each of the architectural design parameters of the network (e.g. number of convolutional blocks, depth, number of stacks, input features, etc). This parametrization allows to regularize the whole network and drastically reduce the number of parameters. Our model is end-to-end trainable and the low-rank structure imposed on the weight tensor acts as an implicit regularization. We study the case of networks with rich structure, namely Fully Convolutional Networks (FCNs), which we propose to parametrize with a single 8th-order tensor. We show that our approach can achieve superior performance with small compression rates, and attain high compression rates with negligible drop in accuracy for the challenging task of human pose estimation.


Learning Low-Rank Approximation for CNNs

arXiv.org Machine Learning

Low-rank approximation is an effective model compression technique to not only reduce parameter storage requirements, but to also reduce computations. For convolutional neural networks (CNNs), however, well-known low-rank approximation methods, such as Tucker or CP decomposition, result in degraded model accuracy because decomposed layers hinder training convergence. In this paper, we propose a new training technique that finds a flat minimum in the view of low-rank approximation without a decomposed structure during training. By preserving the original model structure, 2-dimensional low-rank approximation demanding lowering (such as im2col) is available in our proposed scheme. We show that CNN models can be compressed by low-rank approximation with much higher compression ratio than conventional training methods while maintaining or even enhancing model accuracy. We also discuss various 2-dimensional low-rank approximation techniques for CNNs.


Low-Rank Embedding of Kernels in Convolutional Neural Networks under Random Shuffling

arXiv.org Machine Learning

Although the convolutional neural networks (CNNs) have become popular for various image processing and computer vision task recently, it remains a challenging problem to reduce the storage cost of the parameters for resource-limited platforms. In the previous studies, tensor decomposition (TD) has achieved promising compression performance by embedding the kernel of a convolutional layer into a low-rank subspace. However the employment of TD is naively on the kernel or its specified variants. Unlike the conventional approaches, this paper shows that the kernel can be embedded into more general or even random low-rank subspaces. We demonstrate this by compressing the convolutional layers via randomly-shuffled tensor decomposition (RsTD) for a standard classification task using CIFAR-10. In addition, we analyze how the spatial similarity of the training data influences the low-rank structure of the kernels. The experimental results show that the CNN can be significantly compressed even if the kernels are randomly shuffled. Furthermore, the RsTD-based method yields more stable classification accuracy than the conventional TD-based methods in a large range of compression ratios.


Wide Compression: Tensor Ring Nets

arXiv.org Machine Learning

Deep neural networks have demonstrated state-of-the-art performance in a variety of real-world applications. In order to obtain performance gains, these networks have grown larger and deeper, containing millions or even billions of parameters and over a thousand layers. The trade-off is that these large architectures require an enormous amount of memory, storage, and computation, thus limiting their usability. Inspired by the recent tensor ring factorization, we introduce Tensor Ring Networks (TR-Nets), which significantly compress both the fully connected layers and the convolutional layers of deep neural networks. Our results show that our TR-Nets approach {is able to compress LeNet-5 by $11\times$ without losing accuracy}, and can compress the state-of-the-art Wide ResNet by $243\times$ with only 2.3\% degradation in {Cifar10 image classification}. Overall, this compression scheme shows promise in scientific computing and deep learning, especially for emerging resource-constrained devices such as smartphones, wearables, and IoT devices.


A Unified Approximation Framework for Deep Neural Networks

arXiv.org Artificial Intelligence

Deep neural networks (DNNs) have achieved significant success in a variety of real world applications. However, tons of parameters in the networks restrict the efficiency of neural networks due to the large model size and the intensive computation. To address this issue, various compression and acceleration techniques have been investigated, among which low-rank filters and sparse filters are heavily studied. In this paper we propose a unified framework to compress the convolutional neural networks by combining these two strategies, while taking the nonlinear activation into consideration. The filer of a layer is approximated by the sum of a sparse component and a low-rank component, both of which are in favor of model compression. Especially, we constrain the sparse component to be structured sparse which facilitates acceleration. The performance of the network is retained by minimizing the reconstruction error of the feature maps after activation of each layer, using the alternating direction method of multipliers (ADMM). The experimental results show that our proposed approach can compress VGG-16 and AlexNet by over 4X. In addition, 2.2X and 1.1X speedup are achieved on VGG-16 and AlexNet, respectively, at a cost of less increase on error rate.