Comparison of different Unique hard attention transformer models by the formal languages they can recognize

Ryvkin, Leonid

arXiv.org Artificial Intelligence 

The goal of this note is to give an overview of the capabilities of different flavors of unique hard attention transformer encoders in terms of the formal languages they are able to recognize. This study is relevant in the context of the rising use of large language models, which typically follow a transformer architecture. While the model we will be primarily investigating has features very distinct from real-world transformers (we will comment on the distinction later) they can still give valuable insights into the principle underlying transformer capabilities. Roughly speaking, a transformer can be thought of function that, given an input of any length, can construct a sequence of the same length. It transforms one sequence into the other.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found