Non-asymptotic Convergence of Training Transformers for Next-token Prediction

Oct-10-2025, 09:58:56 GMT–Neural Information Processing Systems

NTP is limited, with existing studies focusing mainly on asymptotic performance. This paper provides a fine-grained non-asymptotic analysis of the training dynamics of a one-layer transformer consisting of a self-attention module followed by a feed-forward layer.

arxiv preprint arxiv, equation, transformer, (14 more...)

Neural Information Processing Systems

Oct-10-2025, 09:58:56 GMT

Conferences PDF

Add feedback

Country:
- North America > United States > Ohio > Franklin County > Columbus (0.04)

Genre:
- Research Report > Experimental Study (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (0.46)
    - Statistical Learning (0.93)
  - Natural Language (1.00)

Duplicate Docs Excel Report

Title
Non-asymptotic Convergence of Training Transformers for Next-token Prediction

Similar Docs Excel Report more

Title	Similarity	Source
None found