O (n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers

Neural Information Processing Systems 

Recently, Transformer networks have redefined the state of the art in many NLP tasks.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found