MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training

Open in new window