Recurrent Memory-Augmented Transformers with Chunked Attention for Long-Context Language Modeling

Open in new window