All models are wrong, some are useful: Model Selection with Limited Labels