Understanding Transformers via N-Gram Statistics

Open in new window