Pre-Finetuning for Few-Shot Emotional Speech Recognition