Variational encoder-decoders have shown promising results in seq2seq tasks. However, the training process is known difficult to be controlled because latent variables tend to be ignored while decoding. In this paper, we thoroughly analyze the reason behind this training difficulty, compare different ways of alleviating it and propose a new framework that helps significantly improve the overall performance.
Su, Hui (The Hong Kong Polytechnic University) | Shen, Xiaoyu (Max Planck Institute Informatics) | Hu, Pengwei (The Hong Kong Polytechnic University) | Li, Wenjie (The Hong Kong Polytechnic University) | Chen, Yun ( The University of Hong Kong )
This paper presents a Generative Adversarial Network (GAN) to model multiturn dialogue generation, which trains a latent hierarchical recurrent encoder-decoder simultaneously with a discriminative classifier that make the prior approximate to the posterior. Experiments show that our model achieves better results.
Serban, Iulian Vlad (University of Montreal) | Sordoni, Alessandro (Maluuba Inc) | Lowe, Ryan (McGill University) | Charlin, Laurent (HEC Montréal) | Pineau, Joelle (McGill University ) | Courville, Aaron (University of Montreal) | Bengio, Yoshua (University of Montreal)
Sequential data often possesses hierarchical structures with complex dependencies between sub-sequences, such as found between the utterances in a dialogue. To model these dependencies in a generative framework, we propose a neural network-based generative architecture, with stochastic latent variables that span a variable number of time steps. We apply the proposed model to the task of dialogue response generation and compare it with other recent neural-network architectures. We evaluate the model performance through a human evaluation study. The experiments demonstrate that our model improves upon recently proposed models and that the latent variables facilitate both the generation of meaningful, long and diverse responses and maintaining dialogue state.
Diversity plays a vital role in many text generating applications. In recent years, Conditional Variational Auto Encoders (CVAE) have shown promising performances for this task. However, they often encounter the so called KL-Vanishing problem. Previous works mitigated such problem by heuristic methods such as strengthening the encoder or weakening the decoder while optimizing the CVAE objective function. Nevertheless, the optimizing direction of these methods are implicit and it is hard to find an appropriate degree to which these methods should be applied. In this paper, we propose an explicit optimizing objective to complement the CVAE to directly pull away from KL-vanishing. In fact, this objective term guides the encoder towards the "best encoder" of the decoder to enhance the expressiveness. A labeling network is introduced to estimate the "best encoder". It provides a continuous label in the latent space of CVAE to help build a close connection between latent variables and targets. The whole proposed method is named Self Labeling CVAE~(SLCVAE). To accelerate the research of diverse text generation, we also propose a large native one-to-many dataset. Extensive experiments are conducted on two tasks, which show that our method largely improves the generating diversity while achieving comparable accuracy compared with state-of-art algorithms.
Advances in neural variational inference have facilitated the learning of powerful directed graphical models with continuous latent variables, such as variational autoencoders. The hope is that such models will learn to represent rich, multi-modal latent factors in real-world data, such as natural language text. However, current models often assume simplistic priors on the latent variables - such as the uni-modal Gaussian distribution - which are incapable of representing complex latent factors efficiently. To overcome this restriction, we propose the simple, but highly flexible, piecewise constant distribution. This distribution has the capacity to represent an exponential number of modes of a latent target distribution, while remaining mathematically tractable. Our results demonstrate that incorporating this new latent distribution into different models yields substantial improvements in natural language processing tasks such as document modeling and natural language generation for dialogue.