Theoretical Properties for Neural Networks with Weight Matrices of Low Displacement Rank
Zhao, Liang, Liao, Siyu, Wang, Yanzhi, Li, Zhe, Tang, Jian, Pan, Victor, Yuan, Bo
Neural networks, especially large-scale deep neural networks, have made remarkable success in various applications such as computer vision, natural language processing, etc. [14][21]. However, large-scale neural networks are both memory-intensive and computation-intensive, thereby posing severe challenges when deploying those large-scale neural network models on memory-constrained and energy-constrained embedded devices. To overcome these limitations, many studies and approaches, such as connection pruning [9][8], low rank approximation [7][12], sparsity regularization [23][16] etc., have been proposed to reduce the model size of large-scale (deep) neural networks. LDR Construction and LDR Neural Networks: Among those efforts, low displacement rank (LDR) construction is a type of structure-imposing technique for network model reduction and computational complexity reduction. By regularizing the weight matrices of neural networks using the format of LDR matrices (when weight matrices are square) or the composition of multiple LDR matrices (when weight matrices are non-square), a strong structure is naturally imposed to the construction of neural networks. Since an LDR matrix typically requires O(n) independent parameters and exhibits fast matrix operation algorithms [18], an immense space for network model and computational complexity reduction can be enabled. Pioneering work in this direction [3][20] applied special types of LDR matrices (structured matrices), such as circulant 1 Figure 1: Examples of commonly used LDR (structured) matrices, i.e., circulant, Cauchy, Toeplitz, Hankel, and Vandermonde matrices.
Sep-21-2017