A Appendix

Neural Information Processing Systems 

A.1 Speech Translation Evaluation One hyperparameter in our speech translation evaluation is the threshold on the alignment scores. Mined speech-text pairs are included in the train set if their alignment scores are greater than or equal to the threshold. Speech translation models are trained on the combination of CoVoST2 train set and mined data at different thresholds. We report the performance of each model on the dev set of Common Voice in Figure 5, and find the optimal value for the threshold. Figure 5: BLEU on dev set achieved by S2T models trained on CoVoST train set + mined data at different thresholds.