MolPILE -- large-scale, diverse dataset for molecular representation learning