The Art of Breaking Words: Rethinking Multilingual Tokenizer Design