MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

Open in new window