Why pre-training is beneficial for downstream classification tasks?

Open in new window