Fine-tuning for Better Few Shot Prompting: An Empirical Comparison for Short Answer Grading