In-Context Learning of Linear Dynamical Systems with Transformers: Error Bounds and Depth-Separation

Cole, Frank, Lu, Yulong, Zhang, Tianhao, Zhao, Yuxuan

Feb-13-2025–arXiv.org Machine Learning

This paper investigates approximation-theoretic aspects of the in-context learning capability of the transformers in representing a family of noisy linear dynamical systems. Our first theoretical result establishes an upper bound on the approximation error of multi-layer transformers with respect to an $L^2$-testing loss uniformly defined across tasks. This result demonstrates that transformers with logarithmic depth can achieve error bounds comparable with those of the least-squares estimator. In contrast, our second result establishes a non-diminishing lower bound on the approximation error for a class of single-layer linear transformers, which suggests a depth-separation phenomenon for transformers in the in-context learning of dynamical systems. Moreover, this second result uncovers a critical distinction in the approximation power of single-layer linear transformers when learning from IID versus non-IID data.

large language model, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

Feb-13-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.34)

Technology:
- Information Technology
  - Scientific Computing (0.93)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Natural Language > Large Language Model (0.67)
    - Machine Learning
      - Statistical Learning (0.68)
      - Neural Networks > Deep Learning (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found