BemaGANv2: A Tutorial and Comparative Survey of GAN-based Vocoders for Long-Term Audio Generation
Park, Taesoo, Jeong, Mungwi, Park, Mingyu, Kim, Narae, Kim, Junyoung, Kim, Mujung, Yoo, Jisang, Lee, Hoyun, Kim, Sanghoon, Kwon, Soonchul
–arXiv.org Artificial Intelligence
This paper presents a tutorial-style survey and implementation guide of BemaGANv2, an advanced GANbased vocoder designed for high-fidelity and long-term audio generation. Long-term audio generation is critical for applications in Text-to-Music (TTM) and Text-to-Audio (TTA) systems, where maintaining temporal coherence, prosodic consistency, and harmonic structure over extended durations remains a significant challenge. Built upon the original BemaGAN architecture, BemaGANv2 incorporates major architectural innovations by replacing traditional ResBlocks in the generator with the Anti-aliased Multi-Periodicity composition (AMP) module, which internally applies the Snake activation function to better model periodic structures. In the discriminator framework, we integrate the Multi-Envelope Discriminator (MED), a novel architecture we proposed, to extract rich temporal envelope features crucial for periodicity detection. Coupled with the Multi-Resolution Discriminator (MRD), this combination enables more accurate modeling of long-range dependencies in audio. We systematically evaluate various discriminator configurations, including Multi-Scale Discriminator (MSD) + MED, MSD + MRD, and Multi-Period Discriminator (MPD) + MED + MRD, using objective metrics (Fréchet Audio Distance (FAD), Structural Similarity Index (SSIM), Pearson Correlation Coefficient (PCC), Mel-Cepstral Distortion (MCD)) and subjective evaluations (MOS, SMOS). This paper also provides a comprehensive tutorial on the model architecture, training methodology, and implementation to promote reproducibility. The code and pre-trained models are available at: https://github.com/dinhoitt/BemaGANv2.
arXiv.org Artificial Intelligence
Nov-25-2025
- Country:
- Asia
- Middle East > Oman (0.04)
- South Korea
- Europe > Italy
- Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia
- Genre:
- Instructional Material > Course Syllabus & Notes (0.90)
- Research Report
- Experimental Study (0.68)
- New Finding (1.00)
- Industry:
- Health & Medicine (0.68)
- Information Technology (0.68)
- Leisure & Entertainment (0.46)
- Technology: