Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation