Assessing the Role of Data Quality in Training Bilingual Language Models