Extracting Finite State Machines from Transformers

Oct-8-2024–arXiv.org Artificial Intelligence

Fueled by the popularity of the transformer architecture in deep learning, several works have investigated what formal languages a transformer can learn. Nonetheless, existing results remain hard to compare and a fine-grained understanding of the trainability of transformers on regular languages is still lacking. We investigate transformers trained on regular languages from a mechanistic interpretability perspective. Using an extension of the $L^*$ algorithm, we extract Moore machines from transformers. We empirically find tighter lower bounds on the trainability of transformers, when a finite number of symbols determine the state. Additionally, our mechanistic insight allows us to characterise the regular languages a one-layer transformer can learn with good length generalisation. However, we also identify failure cases where the determining symbols get misrecognised due to saturation of the attention mechanism.

extracting finite state machine, sequence, transformer, (12 more...)

arXiv.org Artificial Intelligence

Oct-8-2024

arXiv.org PDF

Add feedback

Country:
- Asia > China
  - Hong Kong (0.04)
- Europe
  - Austria > Vienna (0.14)
  - Belgium > Flanders
    - Flemish Brabant > Leuven (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
- North America
  - Canada > Ontario
    - Toronto (0.04)
  - United States (0.14)
- Oceania > Australia
  - Victoria > Melbourne (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.86)
  - Natural Language (1.00)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found