Recurrent Memory Transformer

Neural Information Processing Systems 

Results of experiments show that RMT performs on par with the Transformer-XL on language modeling for smaller memory sizes and outperforms it for tasks that require longer sequence processing. We show that adding memory tokens to Tr-XL is able to improve its performance.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found