The Elephant in the Room: Towards A Reliable Time-Series Anomaly Detection Benchmark