Deep Networks from the Principle of Rate Reduction

Chan, Kwan Ho Ryan, Yu, Yaodong, You, Chong, Qi, Haozhi, Wright, John, Ma, Yi

Oct-27-2020–arXiv.org Machine Learning

This work attempts to interpret modern deep (convolutional) networks from the principles of rate reduction and (shift) invariant classification. We show that the basic iterative gradient ascent scheme for optimizing the rate reduction of learned features naturally leads to a multi-layer deep network, one iteration per layer. The layered architectures, linear and nonlinear operators, and even parameters of the network are all explicitly constructed layer-by-layer in a forward propagation fashion by emulating the gradient scheme. All components of this "white box" network have precise optimization, statistical, and geometric interpretation. This principled framework also reveals and justifies the role of multi-channel lifting and sparse coding in early stage of deep networks. Moreover, all linear operators of the so-derived network naturally become multi-channel convolutions when we enforce classification to be rigorously shift-invariant. The derivation also indicates that such a convolutional network is significantly more efficient to construct and learn in the spectral domain. Our preliminary simulations and experiments indicate that so constructed deep network can already learn a good discriminative representation even without any back propagation training. In recent years, various deep (convolution) network architectures such as AlexNet (Krizhevsky et al., 2012), VGG (Simonyan & Zisserman, 2015), ResNet (He et al., 2016), DenseNet (Huang et al., 2017), Recurrent CNN, LSTM (Hochreiter & Schmidhuber, 1997), Capsule Networks (Hinton et al., 2011), etc., have demonstrated very good performance in classification tasks of real-world datasets such as speeches or images. Nevertheless, almost all such networks are developed through years of empirical trial and error, including both their architectures/operators and the ways they are to be effectively trained. Some recent practices even take to the extreme by searching for effective network structures and training strategies through extensive random search techniques, such as Neural Architecture Search (Zoph & Le, 2017; Baker et al., 2017), AutoML (Hutter et al., 2019), and Learning to Learn (Andrychowicz et al., 2016).

circ, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

Oct-27-2020

arXiv.org PDF

Add feedback

Country:
- Asia (0.28)
- Europe (0.28)
- North America > United States
  - California (0.14)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Health & Medicine (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found