On Evaluating Explanation Utility for Human-AI Decision Making in NLP