PositionCoupling: ImprovingLengthGeneralization ofArithmeticTransformersUsingTaskStructure
–Neural Information Processing Systems
Humans can length-generalize in integer addition because they understand the essential principle of the task. Nevertheless, itisobserved that Transformers typically learn to solve addition only up to the training sequence length (Lee et al., 2024), which is different from thetruearithmetic algorithm thathumans "implement".
Neural Information Processing Systems
Feb-9-2026, 18:52:25 GMT