scratchgan
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.78)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.78)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
We thank our reviewers for their time and valuable comments
We thank our reviewers for their time and valuable comments. We have observed in the literature and also from personal communication at recent conferences incl. We feel this paper will have a significant impact, by showing that stable training can be obtained with REINFORCE. We agree with your point that we are dismissing non-autoregressive language models. We have addressed these typos, thank you for noting them!
Appendix A Implementation of Taylor Expansion on Unit Hamming Sphere
Follow the discussion in Section 3.1.2 In this section, we continue the discussion in Section 3.3 and obtain the form used in (25). The architecture of the discriminator is shown in Table 3. Exponential Linear Units [ One of the issue with BLEU is that in the case that a higher order n-gram precision of a sentence is 0, then the BLEU score will be 0, resulting in severely underestimation. This is due to the fact that BLEU is calculated by the geometric mean of precision. Sentences in the COCO dataset have a maximum length of 24 tokens and a vocabulary of 4.6k Training and validation data both consist of 10k sentences.
- North America > United States (0.94)
- Europe (0.14)
- Asia > Middle East > Syria (0.04)
Training language GANs from Scratch
d'Autume, Cyprien de Masson, Rosca, Mihaela, Rae, Jack, Mohamed, Shakir
Generative Adversarial Networks (GANs) enjoy great success at image generation, but have proven difficult to train in the domain of natural language. Challenges with gradient estimation, optimization instability, and mode collapse have lead practitioners to resort to maximum likelihood pre-training, followed by small amounts of adversarial fine-tuning. The benefits of GAN fine-tuning for language generation are unclear, as the resulting models produce comparable or worse samples than traditional language models. We show it is in fact possible to train a language GAN from scratch -- without maximum likelihood pre-training. We combine existing techniques such as large batch sizes, dense rewards and discriminator regularization to stabilize and improve language GANs. The resulting model, ScratchGAN, performs comparably to maximum likelihood training on EMNLP2017 News and WikiText-103 corpora according to quality and diversity metrics.
- Europe > United Kingdom (0.14)
- North America > United States > Iowa (0.04)
- North America > United States > New Hampshire (0.04)
- (6 more...)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.76)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.76)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)