Improving Model Evaluation using SMART Filtering of Benchmark Datasets