Episodic fine-tuning prototypical networks for optimization-based few-shot learning: Application to audio classification