How Much Annotation is Needed to Compare Summarization Models?