Beyond One-Size-Fits-All: Tailored Benchmarks for Efficient Evaluation