LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT

Zhuo, Le, Yuan, Ruibin, Pan, Jiahao, Ma, Yinghao, LI, Yizhi, Zhang, Ge, Liu, Si, Dannenberg, Roger, Fu, Jie, Lin, Chenghua, Benetos, Emmanouil, Chen, Wenhu, Xue, Wei, Guo, Yike

arXiv.org Artificial Intelligence 

ABSTRACT We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method achieving state-of-the-art performance on various lyrics transcription datasets, even in challenging genres such as rock and metal. In the proposed method, Whisper functions as the "ear" by transcribing the audio, while GPT-4 serves as the "brain," acting as an annotator with a strong performance for contextualized output selection and correction. Our experiments show that LyricWhiz significantly reduces Word Error Rate compared to existing methods in Figure 1. Concept illustration of the working LyricWhiz, English and can effectively transcribe lyrics across multiple where user prompts the two advanced models, Whisper languages. Furthermore, we use LyricWhiz to create and ChatGPT, to perform automatic lyrics transcription.