A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification