M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining