Goto

Collaborating Authors

 seginr


SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech

Kim, Minchan, Jeong, Myeonghun, Lee, Joun Yeop, Kim, Nam Soo

arXiv.org Artificial Intelligence

It leverages an optimal text encoder to extract embeddings, transforming each into a segment of frame-level features using a conditional implicit neural representation (INR). This method, named segment-wise INR (SegINR), models temporal dynamics within each segment and autonomously defines segment boundaries, reducing computational costs. We integrate SegINR into a two-stage TTS framework, using it for semantic token prediction. Our experiments in zero-shot adaptive TTS scenarios demonstrate that SegINR outperforms conventional methods in speech quality with computational efficiency.