On the Efficacy of Generalization Error Prediction Scoring Functions