The Transformer Cookbook
Yang, Andy, Watson, Christopher, Xue, Anton, Bhattamishra, Satwik, Llarena, Jose, Merrill, William, Ferreira, Emile Dos Santos, Svete, Anej, Chiang, David
–arXiv.org Artificial Intelligence
We present the transformer cookbook: a collection of techniques for directly encoding algorithms into a transformer's parameters. This work addresses the steep learning curve of such endeavors, a problem exacerbated by a fragmented literature where key results are scattered across numerous papers. In particular, we synthesize this disparate body of findings into a curated set of recipes that demonstrate how to implement everything from basic arithmetic in feed-forward layers to complex data routing via self-attention. Our mise en place of formulations is for both newcomers seeking an accessible entry point and experts in need of a systematic reference. This unified presentation of transformer constructions provides a foundation for future work spanning theoretical research in computational complexity to empirical investigations in architecture design and interpretability.
arXiv.org Artificial Intelligence
Oct-2-2025
- Country:
- Asia > Japan
- Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
- Europe
- Switzerland > Zürich
- Zürich (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Oxfordshire > Oxford (0.04)
- Switzerland > Zürich
- North America > United States
- Pennsylvania (0.04)
- Texas > Travis County
- Austin (0.04)
- Asia > Japan
- Genre:
- Research Report (0.40)
- Technology: