Exposing Attention Glitches with Flip-Flop Language Modeling

Neural Information Processing Systems 

This simple generative task requires a model to copy binary symbols over long-range dependencies, ignoring the tokens in between. We find that Transformer FFLMs suffer from a long tail of sporadic reasoning errors, some of which we can eliminate using various regularization techniques.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found