Rethinking Evidence Hierarchies in Medical Language Benchmarks: A Critical Evaluation of HealthBench