On Measuring the Intrinsic Few-Shot Hardness of Datasets