Semi-supervised Sequence Learning

Andrew M. Dai, Quoc V. Le

Neural Information Processing Systems 

We present two approaches to use unlabeled data to improve Se quence Learning with recurrent networks. The first approach is to predict wha t comes next in a sequence, which is a language model in NLP . The second approa ch is to use a sequence autoencoder, which reads the input sequence into a vector and predicts the input sequence again. These two algorithms can be used as a "pretraining" algorithm for a later supervised sequence learning algorit hm. In other words, the parameters obtained from the pretraining step can then be us ed as a starting point for other supervised training models. In our experiments, w e find that long short term memory recurrent networks after pretrained with the tw o approaches become more stable to train and generalize better. With pretra ining, we were able to achieve strong performance in many classification tasks, su ch as text classification with IMDB, DBpedia or image recognition in CIFAR-10.