fb7451e43f9c1c35b774bcfad7a5714b-Supplemental-Conference.pdf

Neural Information Processing Systems 

Varied number of bit split: To generate the samples in this split, we first sampled the number ofbits, then sampled each bitindividually from auniform Bernoulli distribution. Variednumberofonessplit: Here, we fixed the number of bits at30. NaturalLanguageParityDataset: Inorder totapinto thenatural language understanding capabilities of pretrained language models, we situated the parity task as a"coin flip problem". We trained baseline models with the same parameter count on a modified version of the variable assignment dataset where the order of the operations were randomly shuffled. We used greedy decoding in all of our experiments (including few-shot scratchpad ones).