Revisiting Confidence Estimation: Towards Reliable Failure Prediction