Landscape of Sparse Linear Network: A Brief Investigation
Lin, Dachao, Sun, Ruoyu, Zhang, Zhihua
Deep neural networks have achieved remarkable empirical successes in the domains of computer vision, speech recognition, and natural language processing, sparking an interest in the theory behind their architectures and training. However, deep neural networks are often found to be highly overparameterized making them computationally expensive with large amounts of memory and computational power, which may take up to weeks on a modern multi-GPU server for large datasets such as ImageNet Deng et al. [7]. Hence, they are often unsuitable for smaller devices like embedded electronics, and there is a pressing demand for techniques to optimize models with reduced model size, faster inference and lower power consumption. Sparse networks, that is, neural networks in which a large subset of the model parameters are zero, have emerged as one of the leading approaches for reducing model parameter count. It has been shown empirically that deep neural networks can achieve state-of-the-art results under high levels of sparsity [17, 14, 22].
Sep-15-2020