WhisQ: Cross-Modal Representation Learning for Text-to-Music MOS Prediction

Open in new window