Forest vs Tree: The $(N, K)$ Trade-off in Reproducible ML Evaluation