Sequence Length Independent Norm-Based Generalization Bounds for Transformers

Open in new window