Are Anomaly Scores Telling the Whole Story? A Benchmark for Multilevel Anomaly Detection