Position Coupling: Leveraging Task Structure for Improved Length Generalization of Transformers

Open in new window