Understanding Transformer Predictions Through Memory Efficient Attention Manipulation

Neural Information Processing Systems 

Most crucially, they require prohibitively large amounts of additional memory since they rely on backpropagation which allocates almost twice as much GPU memory as the forward pass. This renders it difficult, if not impossible, to use explanations in production.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found