Max Planck Institute Informatics
Towards Better Variational Encoder-Decoders in Seq2Seq Tasks
Shen, Xiaoyu (Max Planck Institute Informatics) | Su, Hui (Software Institute, University of Chinese Academy of Science, China)
Variational encoder-decoders have shown promising results in seq2seq tasks. However, the training process is known difficult to be controlled because latent variables tend to be ignored while decoding. In this paper, we thoroughly analyze the reason behind this training difficulty, compare different ways of alleviating it and propose a new framework that helps significantly improve the overall performance.
Improving Variational Encoder-Decoders in Dialogue Generation
Shen, Xiaoyu (Max Planck Institute Informatics) | Su, Hui (Software Institute, University of Chinese Academy of Science) | Niu, Shuzi (Software Institute, University of Chinese Academy of Science) | Demberg, Vera (Saarland University)
Variational encoder-decoders (VEDs) have shown promising results in dialogue generation. However, the latent variable distributions are usually approximated by a much simpler model than the powerful RNN structure used for encoding and decoding, yielding the KL-vanishing problem and inconsistent training objective. In this paper, we separate the training step into two phases: The first phase learns to autoencode discrete texts into continuous embeddings, from which the second phase learns to generalize latent representations by reconstructing the encoded embedding. In this case, latent variables are sampled by transforming Gaussian noise through multi-layer perceptrons and are trained with a separate VED model, which has the potential of realizing a much more flexible distribution. We compare our model with current popular models and the experiment demonstrates substantial improvement in both metric-based and human evaluations.
Dialogue Generation With GAN
Su, Hui (The Hong Kong Polytechnic University) | Shen, Xiaoyu (Max Planck Institute Informatics) | Hu, Pengwei (The Hong Kong Polytechnic University) | Li, Wenjie (The Hong Kong Polytechnic University) | Chen, Yun ( The University of Hong Kong )
This paper presents a Generative Adversarial Network (GAN) to model multiturn dialogue generation, which trains a latent hierarchical recurrent encoder-decoder simultaneously with a discriminative classifier that make the prior approximate to the posterior. Experiments show that our model achieves better results.