Greedy Layer-Wise Training of Deep Networks
Bengio, Yoshua, Lamblin, Pascal, Popovici, Dan, Larochelle, Hugo
–Neural Information Processing Systems
Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly nonlinear and highly-varying functions. However, until recently it was not clear how to train such deep networks, since gradient-based optimization starting from random initialization appears to often get stuck in poor solutions.
Neural Information Processing Systems
Dec-31-2007