Cost-effective speech-to-text with weakly- and semi-supervised training