Fast Attention Over Long Sequences With Dynamic Sparse Flash Attention

Neural Information Processing Systems 

Transformer-based language models have found many diverse applications requiring them to process sequences of increasing length.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found