Provable Length Generalization in Sequence Prediction via Spectral Filtering

Marsden, Annie, Dogariu, Evan, Agarwal, Naman, Chen, Xinyi, Suo, Daniel, Hazan, Elad

Nov-1-2024–arXiv.org Artificial Intelligence

Sequence prediction is a fundamental problem in machine learning with widespread applications in natural language processing, time-series forecasting, and control systems. In this setting, a learner observes a sequence of tokens and iteratively predicts the next token, suffering a loss that measures the discrepancy between the predicted and the true token. Predicting future elements of a sequence based on historical data is crucial for tasks ranging from language modeling to autonomous control. A key challenge in sequence prediction is understanding the role of context length--the number of previous tokens used to make the upcoming prediction--and designing predictors that perform well with limited context due to computational and memory constraints. These resource constraints become particularly significant during the training phase of a predictor, where the computational cost of using long sequences can be prohibitive. Consequently, it is beneficial to design predictors that can learn from a smaller context length while still generalizing well to longer sequences. This leads us to the central question of our investigation: Can we develop algorithms that learn effectively using short contexts but perform comparably to models that use longer contexts?

length generalization, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

Nov-1-2024

arXiv.org PDF

Add feedback

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.67)
  - Natural Language (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found