A Numerically stable Multinomial Diffusion in log space

Neural Information Processing Systems 

B.1 Language Modelling For the language modelling experiments we utilize the standard text8 dataset with sequence length 256 and enwik8 dataset with sequence length 320. The train/val/test splits are 90000000/5000000/5000000 for both text8 and enwik8, as is standard in literature. The Multinomial Text Diffusion models are trained for 300 epochs, whereas the Argmax Flows are trained for 40 epochs, with the exception of the Argmax Coupling Flow on enwik8 which only needs to be trained for 20 epochs. Further details are presented in Tables 6 and 7. In addition, the code to reproduce results will be publicly available. There are no known ethics issues with these datasets at the time of writing.