No Metric to Rule Them All: Toward Principled Evaluations of Graph-Learning Datasets