MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training