Challenges and Opportunities in NLP Benchmarking