We thank all the reviewers for their valuable comments and acknowledging the significance and timeliness of this work
–Neural Information Processing Systems
We thank all the reviewers for their valuable comments and acknowledging the significance and timeliness of this work. MelGAN has important qualities such as: 1.) fast inference speed (2500 KHz) For experiment results in tables 3.1 and 3.2, we use the publicly available For section 3.3, we use a subset of the MusicNet dataset (Thickstun et al., 2016) similar to Mor et al. For the VQ-V AE experiment, we use the piano dataset provided by Dieleman et al. (2018). The MOS of ground truth audio in this dataset is 4.72. WaveNet since Prenger et al. (2019) show that WaveGlow performs similar to WaveNet in ground truth mel-spectrogram This is the reason for the discrepancy in MOS scores in the two tables.
Neural Information Processing Systems
Oct-2-2025, 21:53:40 GMT