AITopics | frame shift

Collaborating Authors

frame shift

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Ultra-lightweight Neural Differential DSP Vocoder For High Quality Speech Synthesis

Agrawal, Prabhav, Koehler, Thilo, Xiu, Zhiping, Serai, Prashant, He, Qing

arXiv.org Artificial IntelligenceJan-18-2024

Neural vocoders model the raw audio waveform and synthesize high-quality audio, but even the highly efficient ones, like MB-MelGAN and LPCNet, fail to run real-time on a low-end device like a smartglass. A pure digital signal processing (DSP) based vocoder can be implemented via lightweight fast Fourier transforms (FFT), and therefore, is a magnitude faster than any neural vocoder. A DSP vocoder often gets a lower audio quality due to consuming over-smoothed acoustic model predictions of approximate representations for the vocal tract. In this paper, we propose an ultra-lightweight differential DSP (DDSP) vocoder that uses a jointly optimized acoustic model with a DSP vocoder, and learns without an extracted spectral feature for the vocal tract. The model achieves audio quality comparable to neural vocoders with a high average MOS of 4.36 while being efficient as a DSP vocoder. Our C++ implementation, without any hardware-specific optimization, is at 15 MFLOPS, surpasses MB-MelGAN by 340 times in terms of FLOPS, and achieves a vocoder-only RTF of 0.003 and overall RTF of 0.044 while running single-threaded on a 2GHz Intel Xeon CPU.

dsp vocoder, neural vocoder, vocoder, (17 more...)

arXiv.org Artificial Intelligence

2401.1046

Country:

North America > United States (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Germany > Berlin (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Data Science > Data Quality > Data Transformation (0.54)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.52)

Add feedback

Frame Shift Prediction

Yong, Zheng-Xin, Watson, Patrick D., Torrent, Tiago Timponi, Czulo, Oliver, Baker, Collin F.

arXiv.org Artificial IntelligenceJan-5-2022

Frame shift is a cross-linguistic phenomenon in translation which results in corresponding pairs of linguistic material evoking different frames. The ability to predict frame shifts enables automatic creation of multilingual FrameNets through annotation projection. Here, we propose the Frame Shift Prediction task and demonstrate that graph attention networks, combined with auxiliary training, can learn cross-linguistic frame-to-frame correspondence and predict frame shifts.

computational linguistic, frame shift, proceedings, (15 more...)

arXiv.org Artificial Intelligence

2201.01837

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.05)
South America > Brazil (0.04)
(13 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.70)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.68)

Add feedback

Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language

Yasuda, Yusuke, Wang, Xin, Takaki, Shinji, Yamagishi, Junichi

arXiv.org Machine LearningOct-29-2018

ABSTRACT End-to-end speech synthesis is a promising approach that directly converts raw text to speech. Although it was shown that Tacotron2 outperforms classical pipeline systems with regards to naturalness in English, its applicability to other languages is still unknown. Japanese could be one of the most difficult languages for which to achieve end-to-end speech synthesis, largely due to its character diversity and pitch accents. Therefore, state-of-theart systems are still based on a traditional pipeline framework that requires a separate text analyzer and duration model. Towards endto-end Japanese speech synthesis, we extend Tacotron to systems with self-attention to capture long-term dependencies related to pitch accents and compare their audio quality with classical pipeline systems under various conditions to show their pros and cons. In a large-scale listening test, we investigated the impacts of the presence of accentual-type labels, the use of force or predicted alignments, and acoustic features used as local condition parameters of the Wavenet vocoder. Our results reveal that although the proposed systems still do not match the quality of a top-line pipeline system for Japanese, we show important stepping stones towards end-to-end Japanese speech synthesis. Index Terms-- speech synthesis, deep learning, Tacotron 1. INTRODUCTION Tacotron [1] opened a novel path to end-to-end speech synthesis.

alignment, deep learning, speech synthesis, (21 more...)

arXiv.org Machine Learning

1810.1196

Country: Asia (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Energy > Oil & Gas (0.77)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback