Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long Contexts