How Robust are Model Rankings: A Leaderboard Customization Approach for Equitable Evaluation