AITopics | Zhang, Xuedong

Collaborating Authors

Zhang, Xuedong

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Language Agnostic Data-Driven Inverse Text Normalization

Chen, Szu-Jui, Paul, Debjyoti, Pang, Yutong, Su, Peng, Zhang, Xuedong

arXiv.org Artificial IntelligenceJan-23-2023

With the emergence of automatic speech recognition (ASR) models, converting the spoken form text (from ASR) to the written form is in urgent need. This inverse text normalization (ITN) problem attracts the attention of researchers from various fields. Recently, several works show that data-driven ITN methods can output high-quality written form text. Due to the scarcity of labeled spoken-written datasets, the studies on non-English data-driven ITN are quite limited. In this work, we propose a language-agnostic data-driven ITN framework to fill this gap. Specifically, we leverage the data augmentation in conjunction with neural machine translated data for low resource languages. Moreover, we design an evaluation method for language agnostic ITN model when only English data is available. Our empirical evaluation shows this language agnostic modeling approach is effective for low resource languages while preserving the performance for high resource languages.

accuracy, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2301.08506

Country: Europe (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Speech (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Improving Data Driven Inverse Text Normalization using Data Augmentation

Pandey, Laxmi, Paul, Debjyoti, Chitkara, Pooja, Pang, Yutong, Zhang, Xuedong, Schubert, Kjell, Chou, Mark, Liu, Shu, Saraf, Yatharth

arXiv.org Artificial IntelligenceJul-20-2022

Inverse text normalization (ITN) is used to convert the spoken form output of an automatic speech recognition (ASR) system to a written form. Traditional handcrafted ITN rules can be complex to transcribe and maintain. Meanwhile neural modeling approaches require quality large-scale spoken-written pair examples in the same or similar domain as the ASR system (in-domain data), to train. Both these approaches require costly and complex annotations. In this paper, we present a data augmentation technique that effectively generates rich spoken-written numeric pairs from out-of-domain textual data with minimal human annotation. We empirically demonstrate that ITN model trained using our data augmentation technique consistently outperform ITN model trained using only in-domain data across all numeric surfaces like cardinal, currency, and fraction, by an overall accuracy of 14.44%.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2207.09674

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Speech recognition for medical conversations

Chiu, Chung-Cheng, Tripathi, Anshuman, Chou, Katherine, Co, Chris, Jaitly, Navdeep, Jaunzeikare, Diana, Kannan, Anjuli, Nguyen, Patrick, Sak, Hasim, Sankar, Ananth, Tansuwan, Justin, Wan, Nathan, Wu, Yonghui, Zhang, Xuedong

arXiv.org Machine LearningNov-20-2017

In this paper we document our experiences with developing speech recognition for medical transcription - a system that automatically transcribes doctor-patient conversations. Towards this goal, we built a system along two different methodological lines - a Connectionist Temporal Classification (CTC) phoneme based model and a Listen Attend and Spell (LAS) grapheme based model. To train these models we used a corpus of anonymized conversations representing approximately 14,000 hours of speech. Because of noisy transcripts and alignments in the corpus, a significant amount of effort was invested in data cleaning issues. We describe a two-stage strategy we followed for segmenting the data. The data cleanup and development of a matched language model was essential to the success of the CTC based models. The LAS based models, however were found to be resilient to alignment and transcript noise and did not require the use of language models. CTC models were able to achieve a word error rate of 20.1%, and the LAS models were able to achieve 18.3%. Our analysis shows that both models perform well on important medical utterances and therefore can be practical for transcribing medical conversations.

deep learning, speech recognition, transcript, (21 more...)

arXiv.org Machine Learning

1711.07274

Genre: Research Report (0.65)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback