A Formal Framework for Understanding Length Generalization in Transformers

Open in new window