AITopics | Karita, Shigeki

Collaborating Authors

Karita, Shigeki

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations

Koizumi, Yuma, Zen, Heiga, Karita, Shigeki, Ding, Yifan, Yatabe, Kohei, Morioka, Nobuyuki, Zhang, Yu, Han, Wei, Bapna, Ankur, Bacchiani, Michiel

arXiv.org Artificial IntelligenceAug-14-2023

Speech restoration (SR) is a task of converting degraded speech signals into high-quality ones. In this study, we propose a robust SR model called Miipher, and apply Miipher to a new SR application: increasing the amount of high-quality training data for speech generation by converting speech samples collected from the Web to studio-quality. To make our SR model robust against various degradation, we use (i) a speech representation extracted from w2v-BERT for the input feature, and (ii) a text representation extracted from transcripts via PnG-BERT as a linguistic conditioning feature. Experiments show that Miipher (i) is robust against various audio degradation and (ii) enable us to train a high-quality text-to-speech (TTS) model from restored speech samples collected from the Web. Audio samples are available at our demo page: google.github.io/df-conformer/miipher/

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2303.01664

Country: Asia > Japan > Honshū (0.14)

Genre: Research Report > New Finding (0.35)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Add feedback

Lenient Evaluation of Japanese Speech Recognition: Modeling Naturally Occurring Spelling Inconsistency

Karita, Shigeki, Sproat, Richard, Ishikawa, Haruko

arXiv.org Artificial IntelligenceJun-7-2023

Word error rate (WER) and character error rate (CER) are standard metrics in Speech Recognition (ASR), but one problem has always been alternative spellings: If one's system transcribes adviser whereas the ground truth has advisor, this will count as an error even though the two spellings really represent the same word. Japanese is notorious for ``lacking orthography'': most words can be spelled in multiple ways, presenting a problem for accurate ASR evaluation. In this paper we propose a new lenient evaluation metric as a more defensible CER measure for Japanese ASR. We create a lattice of plausible respellings of the reference transcription, using a combination of lexical resources, a Japanese text-processing system, and a neural machine translation model for reconstructing kanji from hiragana or katakana. In a manual evaluation, raters rated 95.4% of the proposed spelling variants as plausible. ASR results show that our method, which does not penalize the system for choosing a valid alternate spelling of a word, affords a 2.4%-3.1% absolute reduction in CER depending on the task.

machine learning, natural language, variation, (22 more...)

arXiv.org Artificial Intelligence

2306.0453

Country:

Europe (1.00)
North America > United States > Maryland (0.14)
North America > United States > Colorado (0.14)
Asia > Middle East > Qatar (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.89)

Add feedback