Optimal Memorization Capacity of Transformers
–arXiv.org Artificial Intelligence
In recent years, the Transformer architecture (Vaswani et al., 2017) has played a pivotal role in the field of machine learning, becoming indispensable for a variety of models in the community. In addition to the original breakthroughs in natural language processing, such as the GPT series (Brown et al., 2020; Radford et al., 2018, 2019), it has been observed that in numerous applications, higher accuracy can be achieved by replacing existing models with Transformers. Specifically, models such as the Vision Transformer (Dosovitskiy et al., 2021) in image processing and the Diffusion Transformer (Peebles & Xie, 2023) in generative tasks have demonstrated exceptional performances in a wide variety of tasks. These examples demonstrate how effective and versatile Transformers are for a diverse range of purposes. Although the high performance of Transformers has led to their widespread use in practice, there are ongoing attempts to theoretically analyze what exactly contributes to their superior performance.
arXiv.org Artificial Intelligence
Sep-26-2024
- Country:
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Genre:
- Research Report (1.00)
- Technology: