Speech Representation Learning Through Self-supervised Pretraining And Multi-task Finetuning