ALPINE: Unveiling The Planning Capability of Autoregressive Learning in Language Models Siwei Wang

Neural Information Processing Systems 

Our mathematical characterization shows that Transformer architectures can execute path-finding by embedding the adjacency and reachability matrices within their weights. Furthermore, our theoretical analysis of gradient-based learning dynamics reveals that LLMs can learn both the adjacency and a limited form of the reachability matrices.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found