ECBD: Evidence-Centered Benchmark Design for NLP