Understanding Transformer Predictions Through Memory Efficient Attention Manipulation
–Neural Information Processing Systems
Most crucially, they require prohibitively large amounts of additional memory since they rely on backpropagation which allocates almost twice as much GPU memory as the forward pass. This renders it difficult, if not impossible, to use explanations in production.
Neural Information Processing Systems
Oct-9-2025, 07:15:46 GMT
- Country:
- Asia (0.04)
- Europe > Germany
- Hesse > Darmstadt Region > Darmstadt (0.04)
- North America > United States
- California > Los Angeles County > Long Beach (0.04)
- Technology: