Toward a Stable, Fair, and Comprehensive Evaluation