Quispe, Guillaume
Diffusion bridges vector quantized Variational AutoEncoders
Cohen, Max, Quispe, Guillaume, Corff, Sylvain Le, Ollion, Charles, Moulines, Eric
Vector Quantised-Variational AutoEncoders (VQ-VAE) are generative models based on discrete latent representations of the data, where inputs are mapped to a finite set of learned embeddings.To generate new samples, an autoregressive prior distribution over the discrete states must be trained separately. This prior is generally very complex and leads to very slow generation. In this work, we propose a new model to train the prior and the encoder/decoder networks simultaneously. We build a diffusion bridge between a continuous coded vector and a non-informative prior distribution. The latent discrete states are then given as random functions of these continuous vectors. We show that our model is competitive with the autoregressive prior on the mini-Imagenet dataset and is very efficient in both optimization and sampling. Our framework also extends the standard VQ-VAE and enables end-to-end training.
Learning Natural Language Generation from Scratch
Donati, Alice Martin, Quispe, Guillaume, Ollion, Charles, Corff, Sylvain Le, Strub, Florian, Pietquin, Olivier
Since the development of generic language models trained on massive unlabelled text corpora (Radford et al., 2019; Brown et al., 2020), state-of-the art language processing systems rely on sequential transfer learning (Ruder, 2019). The pretrained Language Model (LM) is fine-tuned on the downstream task using a standard supervised learning (SL) objective (Wu et al., 2019; Peters et al., 2019). Yet, such an approach suffers from several issues (Chen et al., 2020): (i) catastrophic forgetting when a model forgets previously learned knowledge and overfits to target domains, (ii) computational inefficiency from fine-tuning billionparameters networks, and (iii) the need of supervised datasets. Moreover, task-specific language models learned with SL suffer from well-studied text degeneration issues (Holtzman et al., 2019), such as the exposure bias (Bengio et al., 2015), language biases (Saleh et al., 2020; Jaques et al., 2020), or a lack of diversity (Li et al., 2015). On the other hand, text generation can be naturally framed as a sequential decision making problem, with the sequence of words seen as successive actions over a vocabulary. Thus, some researchers have recently focused on learning language models using instead Reinforcement Learning (RL) (Strub et al., 2017; Das et al., 2017; Narasimhan et al., 2015).