PitchFlower: A flow-based neural audio codec with pitch controllability
Torres, Diego, Roebel, Axel, Obin, Nicolas
–arXiv.org Artificial Intelligence
Our approach enforces disentanglement through a simple perturbation: during training, F0 contours are flattened and randomly shifted, while the true F0 is provided as conditioning. A vector-quantization bottleneck prevents pitch recovery, and a flow-based decoder generates high quality audio. Experiments show that PitchFlower achieves more accurate pitch control than WORLD at much higher audio quality, and outperforms SiFi-GAN in controllability while maintaining comparable quality. Beyond pitch, this framework provides a simple and extensible path toward disentangling other speech attributes.
arXiv.org Artificial Intelligence
Oct-30-2025
- Genre:
- Research Report (0.82)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks (0.96)
- Natural Language (0.69)
- Speech (0.69)
- Information Technology > Artificial Intelligence