NLP Tutorials -- Part 20: Compressive Transformer
Welcome back to yet another interesting improvement of the Transformer (Attention is All You Need) architecture -- Compressive Transformers. This particular architecture has a lower memory requirement than Vanilla Transformer and is similar to the Transformer-XL that models longer sequences efficiently. The below image depicts how the memory is compressed. We can also say that this is drawing some parallels to the human brain -- We have a brilliant memory because of the power of compressing and storing information very intelligently. This sure seems interesting, doesn't it?
Jun-3-2022, 07:00:32 GMT