Staircase Attention for Recurrent Processing of Sequences

Neural Information Processing Systems 

Staircase model, Transformer cores are stacked diagonally, so each step sees one new input chunk.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found