Multi-Word Tokenization for Sequence Compression