SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech

Kim, Minchan, Jeong, Myeonghun, Lee, Joun Yeop, Kim, Nam Soo

Oct-6-2024–arXiv.org Artificial Intelligence

It leverages an optimal text encoder to extract embeddings, transforming each into a segment of frame-level features using a conditional implicit neural representation (INR). This method, named segment-wise INR (SegINR), models temporal dynamics within each segment and autonomously defines segment boundaries, reducing computational costs. We integrate SegINR into a two-stage TTS framework, using it for semantic token prediction. Our experiments in zero-shot adaptive TTS scenarios demonstrate that SegINR outperforms conventional methods in speech quality with computational efficiency.

arxiv preprint arxiv, implicit neural representation, seginr, (11 more...)

arXiv.org Artificial Intelligence

Oct-6-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - United States > California
    - San Diego County > San Diego (0.04)
  - Canada > Quebec
    - Montreal (0.04)
- Europe > Italy
  - Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > South Korea
  - Seoul > Seoul (0.05)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (0.41)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language (1.00)
  - Machine Learning > Neural Networks (0.94)
  - Speech > Speech Synthesis (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found