Effects of sub-word segmentation on performance of transformer language models

Open in new window