Learning Spectral Methods by Transformers

He, Yihan, Cao, Yuan, Chen, Hong-Yu, Wu, Dennis, Fan, Jianqing, Liu, Han

Jan-12-2025–arXiv.org Machine Learning

Most modern LLMs use Transformers [30] as their backbones, which demonstrate significant advantages over many existing neural network models. Transformers achieve many state-of-the-art performances in learning tasks including natural language processing [33] and computer vision [18]. However, the underlying mechanism for the success of Transformers remains largely a mystery to theoretical researchers. It has been discussed in a line of recent works [2, 4, 15, 38] that, instead of learning simple prediction rules (such as a linear model) Transformers are capable of learning to perform learning algorithms that can automatically generate new prediction rules. For instance, when a new dataset is organized as the input of a Transformer, the model can automatically perform linear regression on this new dataset to produce a newly fitted linear model and make predictions accordingly. This idea of treating Transformers as "algorithm approximators" has provided insights into the power of large language models. However, these existing works only provide guarantees for the in-context supervised learning capacities of Transformers. It remains unclear whether Transformers are capable of handling unsupervised tasks as well.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

Jan-12-2025

arXiv.org PDF

Add feedback

Country:
- Asia (0.14)

Genre:
- Research Report (1.00)

Industry:
- Energy > Oil & Gas
  - Midstream (0.45)
- Materials > Chemicals
  - Commodity Chemicals > Petrochemicals
    - LNG (0.45)
  - Industrial Gases > Liquified Gas (0.45)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (0.46)
    - Statistical Learning > Regression (0.34)
  - Natural Language (1.00)