Algorithmic Capabilities of Random Transformers

Oct-10-2025, 15:09:10 GMT–Neural Information Processing Systems

Why is this the case? One possibility is that some aspect of the transformer architecture makes these behaviors easy to learn. Under this hypothesis, transformer models do not implement any useful functionality when initialized; however, their loss landscape is structured such that they can be (computation-and sample-) efficiently optimized for behaviors of interest.

arxiv preprint arxiv, random transformer, transformer, (13 more...)

Neural Information Processing Systems

Oct-10-2025, 15:09:10 GMT

Conferences PDF

Add feedback

Country:
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre:
- Research Report > Experimental Study (1.00)

Industry:
- Information Technology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Vision (0.93)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
bccdd196d798a51a4961989984a9ed4a-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found