AITopics | pinyin

Collaborating Authors

pinyin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Chain of Correction for Full-text Speech Recognition with Large Language Models

Tang, Zhiyuan, Wang, Dong, Zhou, Zhikai, Liu, Yong, Huang, Shen, Shang, Shidong

arXiv.org Artificial IntelligenceAug-21-2025

Full-text error correction with Large Language Models (LLMs) for Automatic Speech Recognition (ASR) is attracting increased attention for its ability to address a wide range of error types, such as punctuation restoration and inverse text normalization, across long context. However, challenges remain regarding stability, controllability, completeness, and fluency. To mitigate these issues, this paper proposes the Chain of Correction (CoC), which uses a multi-turn chat format to correct errors segment by segment, guided by pre-recognized text and full-text context for better semantic understanding. Utilizing the open-sourced ChFT dataset, we fine-tune a pre-trained LLM to evaluate CoC's performance. Experiments show that CoC significantly outperforms baseline and benchmark systems in correcting full-text ASR outputs. We also analyze correction thresholds to balance under-correction and over-rephrasing, extrapolate CoC on extra-long ASR outputs, and explore using other types of information to guide error correction.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2504.01519

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

Deng, Wei, Zhou, Siyi, Shu, Jingchen, Wang, Jinchao, Wang, Lu

arXiv.org Artificial IntelligenceFeb-8-2025

Recently, large language model (LLM) based text-to-speech (TTS) systems have gradually become the mainstream in the industry due to their high naturalness and powerful zero-shot voice cloning capabilities.Here, we introduce the IndexTTS system, which is mainly based on the XTTS and Tortoise model. We add some novel improvements. Specifically, in Chinese scenarios, we adopt a hybrid modeling method that combines characters and pinyin, making the pronunciations of polyphonic characters and long-tail characters controllable. We also performed a comparative analysis of the Vector Quantization (VQ) with Finite-Scalar Quantization (FSQ) for codebook utilization of acoustic speech tokens. To further enhance the effect and stability of voice cloning, we introduce a conformer-based speech conditional encoder and replace the speechcode decoder with BigVGAN2. Compared with XTTS, it has achieved significant improvements in naturalness, content consistency, and zero-shot voice cloning. As for the popular TTS systems in the open-source, such as Fish-Speech, CosyVoice2, FireRedTTS and F5-TTS, IndexTTS has a relatively simple training process, more controllable usage, and faster inference speed. Moreover, its performance surpasses that of these systems. Our demos are available at https://index-tts.github.io.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.05512

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > China (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.74)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Learning from Mistakes: Self-correct Adversarial Training for Chinese Unnatural Text Correction

Feng, Xuan, Gu, Tianlong, Liu, Xiaoli, Chang, Liang

arXiv.org Artificial IntelligenceDec-22-2024

Unnatural text correction aims to automatically detect and correct spelling errors or adversarial perturbation errors in sentences. Existing methods typically rely on fine-tuning or adversarial training to correct errors, which have achieved significant success. However, these methods exhibit poor generalization performance due to the difference in data distribution between training data and real-world scenarios, known as the exposure bias problem. In this paper, we propose a self-correct adversarial training framework for \textbf{L}earn\textbf{I}ng from \textbf{MI}s\textbf{T}akes (\textbf{LIMIT}), which is a task- and model-independent framework to correct unnatural errors or mistakes. Specifically, we fully utilize errors generated by the model that are actively exposed during the inference phase, i.e., predictions that are inconsistent with the target. This training method not only simulates potential errors in real application scenarios, but also mitigates the exposure bias of the traditional training process. Meanwhile, we design a novel decoding intervention strategy to maintain semantic consistency. Extensive experimental results on Chinese unnatural text error correction datasets show that our proposed method can correct multiple forms of errors and outperforms the state-of-the-art text correction methods. In addition, extensive results on Chinese and English datasets validate that LIMIT can serve as a plug-and-play defense module and can extend to new models and datasets without further training.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2412.17279

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Toronto (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
(7 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character Pre-training in LLMs

Yuhang, Yang, Yizhou, Peng, Chng, Eng Siong, Zhong, Xionghu

arXiv.org Artificial IntelligenceSep-24-2024

The integration of large language models (LLMs) with pre-trained speech models has opened up new avenues in automatic speech recognition (ASR). While LLMs excel in multimodal understanding tasks, effectively leveraging their capabilities for ASR remains a significant challenge. This paper presents a novel training approach to enhance LLM performance in ASR tasks. We propose pre-training LLMs on Pinyin embedding sequences, which represent pronunciation features, to generate corresponding Chinese characters. This step enables the LLM to adapt to generating text from pronunciation features before encountering real speech data. Furthermore, we fine-tune the LoRA parameters to enhance the LLM's understanding of speech modality information. In AISHELL-1 corpus, our approach yields a 9.5% relative improvement in ASR tasks compared to the baseline without Pinyi-to-Character pre-training. Additionally, incorporating auxiliary text data for Pinyi-to-Character pre-training further boosts performance, achieving a 19.0% relative improvement.

arxiv preprint arxiv, llm, pinyin, (13 more...)

arXiv.org Artificial Intelligence

2409.16005

Country:

Asia > Singapore (0.04)
Asia > China (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Large Language Model Should Understand Pinyin for Chinese ASR Error Correction

Li, Yuang, Qiao, Xiaosong, Zhao, Xiaofeng, Zhao, Huan, Tang, Wei, Zhang, Min, Yang, Hao

arXiv.org Artificial IntelligenceSep-20-2024

Large language models can enhance automatic speech recognition systems through generative error correction. In this paper, we propose Pinyin-enhanced GEC, which leverages Pinyi, the phonetic representation of Mandarin Chinese, as supplementary information to improve Chinese ASR error correction. Our approach only utilizes synthetic errors for training and employs the one-best hypothesis during inference. Additionally, we introduce a multitask training approach involving conversion tasks between Pinyin and text to align their feature spaces. Experiments on the Aishell-1 and the Common Voice datasets demonstrate that our approach consistently outperforms GEC with text-only input. More importantly, we provide intuitive explanations for the effectiveness of PY-GEC and multitask training from two aspects: 1) increased attention weight on Pinyin features; and 2) aligned feature space between Pinyin and text hidden states.

error correction, pinyin, pinyin feature, (15 more...)

arXiv.org Artificial Intelligence

2409.13262

Country:

South America > Paraguay > Asunción > Asunción (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Can Pre-trained Language Models Understand Chinese Humor?

Chen, Yuyan, Li, Zhixu, Liang, Jiaqing, Xiao, Yanghua, Liu, Bang, Chen, Yunwen

arXiv.org Artificial IntelligenceJul-4-2024

Humor understanding is an important and challenging research in natural language processing. As the popularity of pre-trained language models (PLMs), some recent work makes preliminary attempts to adopt PLMs for humor recognition and generation. However, these simple attempts do not substantially answer the question: {\em whether PLMs are capable of humor understanding?} This paper is the first work that systematically investigates the humor understanding ability of PLMs. For this purpose, a comprehensive framework with three evaluation steps and four evaluation tasks is designed. We also construct a comprehensive Chinese humor dataset, which can fully meet all the data requirements of the proposed evaluation framework. Our empirical study on the Chinese humor dataset yields some valuable observations, which are of great guiding value for future optimization of PLMs in humor understanding and generation.

humor dataset, knowledge, plm, (13 more...)

arXiv.org Artificial Intelligence

2407.04105

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Singapore > Central Region > Singapore (0.05)
Asia > China > Shanghai > Shanghai (0.05)
(10 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models

Tang, Zhiyuan, Wang, Dong, Huang, Shen, Shang, Shidong

arXiv.org Artificial IntelligenceJul-1-2024

Recent studies have demonstrated the efficacy of large language models (LLMs) in error correction for automatic speech recognition (ASR). However, much of the research focuses on the English language. This paper redirects the attention to Chinese. Firstly, we construct a specialized benchmark dataset aimed at error correction for Chinese ASR with 724K hypotheses-transcription pairs, named the Chinese Hypotheses Paradise dataset (ChineseHP), which contains a wide range of scenarios and presents significant challenges. Subsequently, we conduct a preliminary evaluation using the dataset for both direct-prompting and fine-tuning pre-trained LLMs. Furthermore, we propose a straightforward method of Pinyin regularization for prompts, which involves the transcription of Pinyin directly from text hypotheses. The experimental results reveal that Pinyin regularization consistently enhances the error-correcting ability of LLMs when compared with those without regularization. The dataset is available on the website.

dataset, error correction, hypothesis, (13 more...)

arXiv.org Artificial Intelligence

2407.01909

Country: Asia > China (0.05)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

MFE-NER: Multi-feature Fusion Embedding for Chinese Named Entity Recognition

Li, Jiatong, Meng, Kui

arXiv.org Artificial IntelligenceApr-17-2024

In Chinese Named Entity Recognition, character substitution is a complicated linguistic phenomenon. Some Chinese characters are quite similar as they share the same components or have similar pronunciations. People replace characters in a named entity with similar characters to generate a new collocation but referring to the same object. As a result, it always leads to unrecognizable or mislabeling errors in the NER task. In this paper, we propose a lightweight method, MFE-NER, which fuses glyph and phonetic features, to help pre-trained language models handle the character substitution problem in the NER task with limited extra cost. Basically, in the glyph domain, we disassemble Chinese characters into Five-Stroke components to represent structure features. In the phonetic domain, an improved phonetic system is proposed in our work, making it reasonable to describe phonetic similarity among Chinese characters. Experiments demonstrate that our method performs especially well in detecting character substitutions while slightly improving the overall performance of Chinese NER.

chinese character, mfe-ner, pre-trained language model, (14 more...)

arXiv.org Artificial Intelligence

2109.07877

Country:

Asia > China > Shanghai > Shanghai (0.05)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploring Chinese Humor Generation: A Study on Two-Part Allegorical Sayings

Xu, Rongwu

arXiv.org Artificial IntelligenceMar-15-2024

Humor, a culturally nuanced aspect of human language, poses challenges for computational understanding and generation, especially in Chinese humor, which remains relatively unexplored in the NLP community. This paper investigates the capability of state-of-the-art language models to comprehend and generate Chinese humor, specifically focusing on training them to create allegorical sayings. We employ two prominent training methods: fine-tuning a medium-sized language model and prompting a large one. Our novel fine-tuning approach incorporates fused Pinyin embeddings to consider homophones and employs contrastive learning with synthetic hard negatives to distinguish humor elements. Human-annotated results show that these models can generate humorous allegorical sayings, with prompting proving to be a practical and effective method. However, there is still room for improvement in generating allegorical sayings that match human creativity.

explanation, learning, proceedings, (15 more...)

arXiv.org Artificial Intelligence

2403.10781

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.70)

Add feedback

External Knowledge Augmented Polyphone Disambiguation Using Large Language Model

Li, Chen

arXiv.org Artificial IntelligenceDec-19-2023

One of the key issues in Mandarin Chinese text-to-speech (TTS) systems is polyphone disambiguation when doing grapheme-to-phoneme (G2P) conversion. In this paper, we introduce a novel method to solve the problem as a generation task. Following the trending research of large language models (LLM) and prompt learning, the proposed method consists of three modules. Retrieval module incorporates external knowledge which is a multi-level semantic dictionary of Chinese polyphonic characters to format the sentence into a prompt. Generation module adopts the decoder-only Transformer architecture to induce the target text. Postprocess module corrects the generated text into a valid result if needed. Experimental results show that our method outperforms the existing methods on a public dataset called CPP. We also empirically study the impacts of different templates of the prompt, different sizes of training data, and whether to incorporate external knowledge.

disambiguation, polyphone disambiguation, polyphonic character, (11 more...)

arXiv.org Artificial Intelligence

2312.1192

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback