Enhance audio generation controllability through representation similarity regularization

Open in new window