teochew
Teochew-Wild: The First In-the-wild Teochew Dataset with Orthographic Annotations
Pan, Linrong, Jiang, Chenglong, Hou, Gaoze, Gao, Ying
It encompasses both formal written language and a large number of words commonly used in daily life, demonstrating significant diversity. V. EXPERIMENTS In this section, we conduct TTS and ASR experiments on our Teochew-Wild dataset to validate its effectiveness. Given that state-of-the-art TTS and ASR models typically require thousands of hours of training data to converge, we selected models that are suitable for smaller datasets for verification. Specifically, in the TTS experiment, we used the autoregressive (AR) model Tacotron2 [27] and the non-autoregressive (NAR) model FastSpeech2 [28] to predict mel-spectrograms, with the HiFi-GAN [29] vocoder used to convert them into waveforms. In the ASR experiment, we trained the Fairseq S2T Transformer XS [30] with both character-based and pinyin-based annotations.
- Asia > China > Guangdong Province > Guangzhou (0.05)
- Asia > Taiwan (0.04)
- Asia > Southeast Asia (0.04)
How a Translation App Helped My Mother and Me Say 'I Love You'
Sometimes you don't realize what you've been missing out on until it finally happens. For me, it was hearing my Chinese mother tell me she loved me for the first time. A language translation app helped make that possible. It happened earlier this year while my mother was walking my daughter and me to our car after we had spent the day at her house. My toddler told her, "I love you." She replied with the same words.