Context-aware Fine-tuning of Self-supervised Speech Models