Learning Spectral Methods by Transformers
He, Yihan, Cao, Yuan, Chen, Hong-Yu, Wu, Dennis, Fan, Jianqing, Liu, Han
Most modern LLMs use Transformers [30] as their backbones, which demonstrate significant advantages over many existing neural network models. Transformers achieve many state-of-the-art performances in learning tasks including natural language processing [33] and computer vision [18]. However, the underlying mechanism for the success of Transformers remains largely a mystery to theoretical researchers. It has been discussed in a line of recent works [2, 4, 15, 38] that, instead of learning simple prediction rules (such as a linear model) Transformers are capable of learning to perform learning algorithms that can automatically generate new prediction rules. For instance, when a new dataset is organized as the input of a Transformer, the model can automatically perform linear regression on this new dataset to produce a newly fitted linear model and make predictions accordingly. This idea of treating Transformers as "algorithm approximators" has provided insights into the power of large language models. However, these existing works only provide guarantees for the in-context supervised learning capacities of Transformers. It remains unclear whether Transformers are capable of handling unsupervised tasks as well.
Jan-12-2025
- Country:
- Asia (0.14)
- Genre:
- Research Report (1.00)
- Industry:
- Energy > Oil & Gas
- Midstream (0.45)
- Materials > Chemicals
- Commodity Chemicals > Petrochemicals
- LNG (0.45)
- Industrial Gases > Liquified Gas (0.45)
- Commodity Chemicals > Petrochemicals
- Energy > Oil & Gas
- Technology: