Evaluating the Robustness of Test Selection Methods for Deep Neural Networks