Where Does My Model Underperform? A Human Evaluation of Slice Discovery Algorithms