Metis: A Foundation Speech Generation Model with Masked Generative Pre-training