Sequence Length Independent Norm-Based Generalization Bounds for Transformers

Oct-19-2023–arXiv.org Machine Learning

Since Vaswani et al. (2017) debuted the Transformer, it has become one of the most preeminent architectures of its time. It has achieved state of the art prediction capabilities in various fields (Dosovitskiy et al., 2020; Wu et al., 2022; Vaswani et al., 2017; Pettersson and Falkman, 2023) and an implementation of it has even passed the BAR exam (Katz et al., 2023). With such widespread use, the theoretical underpinnings of this architecture are of great interest. Specifically, this paper is concerned with bounding the generalization error when using the Transformer in supervised learning. Upper bounding this can be used to help understand how sample size needs to scale with different architecture parameters and is a very common theoretical tool to understand machine learning algorithms (Kakade et al., 2008; Garg et al., 2020; Truong, 2022; Lin and Zhang, 2019).

artificial intelligence, machine learning, sequence length, (16 more...)

arXiv.org Machine Learning

Oct-19-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found