Rethinking Scientific Summarization Evaluation: Grounding Explainable Metrics on Facet-aware Benchmark