Mini-Sequence Transformers: Optimizing Intermediate Memory for Long Sequences Training

Open in new window