An Information-Theoretic Analysis of In-Context Learning

Jeon, Hong Jun, Lee, Jason D., Lei, Qi, Van Roy, Benjamin

Jan-27-2024–arXiv.org Artificial Intelligence

Previous theoretical results pertaining to meta-learning on sequences build on contrived assumptions and are somewhat convoluted. We introduce new information-theoretic tools that lead to an elegant and very general decomposition of error into three components: irreducible error, meta-learning error, and intra-task error. These tools unify analyses across many meta-learning challenges. To illustrate, we apply them to establish new results about in-context learning with transformers. Our theoretical results characterizes how error decays in both the number of training sequences and sequence lengths. Our results are very general; for example, they avoid contrived mixing time assumptions made by all prior results that establish decay of error with sequence length.

artificial intelligence, information-theoretic analysis, machine learning, (14 more...)

arXiv.org Artificial Intelligence

Jan-27-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California > Santa Clara County (0.14)
  - New York (0.14)

Genre:
- Research Report > New Finding (0.86)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Learning Graphical Models > Directed Networks
    - Bayesian Learning (0.46)
  - Statistical Learning (0.93)