Self-tuning hyper-parameters for unsupervised cross-lingual tokenization