Generative Multi-modal Feedback for Singing Voice Synthesis Evaluation