BPE Stays on SCRIPT: Structured Encoding for Robust Multilingual Pretokenization

Open in new window