PositionCoupling: ImprovingLengthGeneralization ofArithmeticTransformersUsingTaskStructure

Neural Information Processing Systems 

Humans can length-generalize in integer addition because they understand the essential principle of the task. Nevertheless, itisobserved that Transformers typically learn to solve addition only up to the training sequence length (Lee et al., 2024), which is different from thetruearithmetic algorithm thathumans "implement".

Similar Docs  Excel Report  more

TitleSimilaritySource
None found