SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech
Kim, Minchan, Jeong, Myeonghun, Lee, Joun Yeop, Kim, Nam Soo
–arXiv.org Artificial Intelligence
It leverages an optimal text encoder to extract embeddings, transforming each into a segment of frame-level features using a conditional implicit neural representation (INR). This method, named segment-wise INR (SegINR), models temporal dynamics within each segment and autonomously defines segment boundaries, reducing computational costs. We integrate SegINR into a two-stage TTS framework, using it for semantic token prediction. Our experiments in zero-shot adaptive TTS scenarios demonstrate that SegINR outperforms conventional methods in speech quality with computational efficiency.
arXiv.org Artificial Intelligence
Oct-6-2024
- Country:
- Asia > South Korea
- Europe > Italy
- Calabria > Catanzaro Province > Catanzaro (0.04)
- North America
- Canada > Quebec
- Montreal (0.04)
- United States > California
- San Diego County > San Diego (0.04)
- Canada > Quebec
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks (0.94)
- Natural Language (1.00)
- Speech > Speech Synthesis (0.67)
- Vision (1.00)
- Information Technology > Artificial Intelligence