Evaluating the Evaluation of Diversity in Commonsense Generation