Understanding Transformers via N-gram Statistics
–Neural Information Processing Systems
Transformer based large-language models (LLMs) display extreme proficiency with language yet a precise understanding of how they work remains elusive. One way of demystifying transformer predictions would be to describe how they depend on their context in terms of simple template functions. This paper takes a first step in this direction by considering families of functions (i.e.
Neural Information Processing Systems
Mar-27-2025, 00:27:54 GMT
- Country:
- North America
- Mexico > Mexico City (0.14)
- United States > Ohio (0.14)
- North America
- Genre:
- Research Report > Experimental Study (0.93)
- Industry:
- Education (0.46)
- Technology: