SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech

Kim, Minchan, Jeong, Myeonghun, Lee, Joun Yeop, Kim, Nam Soo

arXiv.org Artificial Intelligence 

It leverages an optimal text encoder to extract embeddings, transforming each into a segment of frame-level features using a conditional implicit neural representation (INR). This method, named segment-wise INR (SegINR), models temporal dynamics within each segment and autonomously defines segment boundaries, reducing computational costs. We integrate SegINR into a two-stage TTS framework, using it for semantic token prediction. Our experiments in zero-shot adaptive TTS scenarios demonstrate that SegINR outperforms conventional methods in speech quality with computational efficiency.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found