Exposing Attention Glitches with Flip-Flop Language Modeling

Feb-11-2026, 18:36:17 GMT–Neural Information Processing Systems

This simple generative task requires a model to copy binary symbols over long-range dependencies, ignoring the tokens in between. We find that Transformer FFLMs suffer from a long tail of sporadic reasoning errors, some of which we can eliminate using various regularization techniques.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Feb-11-2026, 18:36:17 GMT

Conferences PDF

Add feedback

Country:
- Asia > Middle East
  - Jordan (0.04)
- Europe
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - Pennsylvania > Allegheny County
      - Pittsburgh (0.04)
    - Washington > King County
      - Seattle (0.04)
- South America > Colombia
  - Meta Department > Villavicencio (0.04)

Genre:
- Research Report > New Finding (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Cognitive Science (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language
    - Chatbot (1.00)
    - Large Language Model (1.00)
    - Machine Translation (0.68)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
510ad3018bbdc5b6e3b10646e2e35771-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found