Exposing Attention Glitches with Flip-Flop Language Modeling

Liu, Bingbin, Ash, Jordan T., Goel, Surbhi, Krishnamurthy, Akshay, Zhang, Cyril

Oct-30-2023–arXiv.org Artificial Intelligence

Recent advancements in scale have yielded large language models (LLMs) with extraordinary proficiency in nuanced reasoning with factual knowledge. Despite these achievements, LLMs are known to produce incorrect outputs, often referred to colloquially as "hallucinations" or "distractions" (Ji et al., 2023). Generally, hallucinations refer to the phenomenon that a model's outputs are syntactically and grammatically accurate but factually incorrect. There are various types of hallucinations, and the focus of this work is the "closeddomain" variety (Saparov and He, 2022; OpenAI, 2023), where the model predictions contain factually incorrect or made-up information according to a given context, regardless of their correctness in the real world. Perhaps surprisingly, such hallucinations can be observed even on simple algorithmic reasoning tasks. As a warmup, consider the queries shown in Figure 1 (and Appendix B.1), where we prompt LLMs to solve addition problems of various lengths. The responses simultaneously illustrate the following: 1. Nontrivial algorithmic generalization: In cases where the models succeed, it is unlikely that these exact numerical sequences appeared in the training data. To correctly output the first digit of the answer, the LLM must resolve a long dependency chain which generally depends on every digit in the input. Somewhere within these networks' internal representations, implementations of addition algorithms have emerged.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Oct-30-2023

arXiv.org PDF

Add feedback

Country:
- Europe (0.67)
- North America > United States
  - Pennsylvania (0.14)

Genre:
- Research Report > New Finding (0.92)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found