Aligning Text-to-Music Evaluation with Human Preferences