Rethinking the Effectiveness of Graph Classification Datasets in Benchmarks for Assessing GNNs