Comparison of different Unique hard attention transformer models by the formal languages they can recognize

Jun-5-2025–arXiv.org Artificial Intelligence

The goal of this note is to give an overview of the capabilities of different flavors of unique hard attention transformer encoders in terms of the formal languages they are able to recognize. This study is relevant in the context of the rising use of large language models, which typically follow a transformer architecture. While the model we will be primarily investigating has features very distinct from real-world transformers (we will comment on the distinction later) they can still give valuable insights into the principle underlying transformer capabilities. Roughly speaking, a transformer can be thought of function that, given an input of any length, can construct a sequence of the same length. It transforms one sequence into the other.

large language model, logic & formal reasoning, machine learning, (22 more...)

arXiv.org Artificial Intelligence

Jun-5-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre:
- Overview (0.88)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.89)
  - Natural Language > Large Language Model (0.54)
  - Representation & Reasoning > Logic & Formal Reasoning (0.85)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found