9232fe81225bcaef853ae32870a2b0fe-Reviews.html

Neural Information Processing Systems 

The key idea of this paper is to use the latent vector generated from upper layers of a Deep Belief Network as target vectors, by adding a regularization cross-entropy part to the standard unsupervised training loss, so as to encourage reconstructions from bottom and top to match. Instead of the standard two-stage training of deep architectures (unsupervised layer-by-layer pretraining, then full network supervised training), training is conducted in three stages with an intermediate stage that uses this hybrid loss. Experimental results show that the intermediate stage substantially improves results on MNIST and Caltech101. This regularization of intermediate layers by top-down generative weights shows good results; the paper is clear and shows how the proposed intuitive way of bridging unsupervised and supervised training indeed improves performance. The paper is overall rather clear although there are some problems (see below).