The Mystery of the Pathological Path-star Task for Language Models

Oct-17-2024–arXiv.org Artificial Intelligence

The recently introduced path-star task is a minimal task designed to exemplify limitations to the abilities of language models (Bachmann and Nagarajan, 2024). It involves a path-star graph where multiple arms radiate from a single starting node and each node is unique. Given the start node and a specified target node that ends an arm, the task is to generate the arm containing that target node. This is straightforward for a human but surprisingly difficult for language models, which did not outperform the random baseline. The authors hypothesized this is due to a deficiency in teacher-forcing and the next-token prediction paradigm. We demonstrate the task is learnable using teacher-forcing in alternative settings and that the issue is partially due to representation. We introduce a regularization method using structured samples of the same graph but with differing target nodes, improving results across a variety of model types. We provide RASP proofs showing the task is theoretically solvable. Finally, we find settings where an encoder-only model can consistently solve the task.

large language model, machine learning, node, (19 more...)

arXiv.org Artificial Intelligence

Oct-17-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Massachusetts > Middlesex County
      - Cambridge (0.04)
  - Canada > Ontario
    - Toronto (0.14)
- Europe
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - China > Hong Kong (0.04)
  - Singapore (0.04)
  - Thailand > Bangkok
    - Bangkok (0.04)
  - Middle East > UAE
    - Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.68)
  - Machine Learning > Neural Networks (0.68)