Why pre-training is beneficial for downstream classification tasks?