Goto

Collaborating Authors

 scratchgan





We thank our reviewers for their time and valuable comments

Neural Information Processing Systems

We thank our reviewers for their time and valuable comments. We have observed in the literature and also from personal communication at recent conferences incl. We feel this paper will have a significant impact, by showing that stable training can be obtained with REINFORCE. We agree with your point that we are dismissing non-autoregressive language models. We have addressed these typos, thank you for noting them!


Appendix A Implementation of Taylor Expansion on Unit Hamming Sphere

Neural Information Processing Systems

Follow the discussion in Section 3.1.2 In this section, we continue the discussion in Section 3.3 and obtain the form used in (25). The architecture of the discriminator is shown in Table 3. Exponential Linear Units [ One of the issue with BLEU is that in the case that a higher order n-gram precision of a sentence is 0, then the BLEU score will be 0, resulting in severely underestimation. This is due to the fact that BLEU is calculated by the geometric mean of precision. Sentences in the COCO dataset have a maximum length of 24 tokens and a vocabulary of 4.6k Training and validation data both consist of 10k sentences.


Training language GANs from Scratch

d'Autume, Cyprien de Masson, Rosca, Mihaela, Rae, Jack, Mohamed, Shakir

arXiv.org Machine Learning

Generative Adversarial Networks (GANs) enjoy great success at image generation, but have proven difficult to train in the domain of natural language. Challenges with gradient estimation, optimization instability, and mode collapse have lead practitioners to resort to maximum likelihood pre-training, followed by small amounts of adversarial fine-tuning. The benefits of GAN fine-tuning for language generation are unclear, as the resulting models produce comparable or worse samples than traditional language models. We show it is in fact possible to train a language GAN from scratch -- without maximum likelihood pre-training. We combine existing techniques such as large batch sizes, dense rewards and discriminator regularization to stabilize and improve language GANs. The resulting model, ScratchGAN, performs comparably to maximum likelihood training on EMNLP2017 News and WikiText-103 corpora according to quality and diversity metrics.