Transformers Pretrained on Procedural Data Contain Modular Structures for Algorithmic Reasoning

Open in new window