Combiner: Full Attention Transformer with Sparse Computation Cost

Neural Information Processing Systems 

Transformers provide a class of expressive architectures that are extremely effective for sequence modeling.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found