AITopics | mandarin chinese

Collaborating Authors

mandarin chinese

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unsupervised Learning and Representation of Mandarin Tonal Categories by a Generative CNN

Schenck, Kai, Beguš, Gašper

arXiv.org Artificial IntelligenceSep-23-2025

This paper outlines the methodology for modeling tonal learning in fully unsupervised models of human language acquisition. Tonal patterns are among the computationally most complex learning objectives in language. We argue that a realistic generative model of human language (ciwGAN) can learn to associate its categorical variables with Mandarin Chinese tonal categories without any labeled data. All three trained models showed statistically significant differences in F0 across categorical variables. The model trained solely on male tokens consistently encoded tone. Our results sug- gest that not only does the model learn Mandarin tonal contrasts, but it learns a system that corresponds to a stage of acquisition in human language learners. We also outline methodology for tracing tonal representations in internal convolutional layers, which shows that linguistic tools can contribute to interpretability of deep learning and can ultimately be used in neural experiments.

artificial intelligence, category, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2509.17859

Country:

Europe (1.00)
North America > United States > California (0.46)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Tone recognition in low-resource languages of North-East India: peeling the layers of SSL-based speech models

Gogoi, Parismita, Kalita, Sishir, Lalhminghlui, Wendy, Terhiija, Viyazonuo, Tzudir, Moakala, Sarmah, Priyankoo, Prasanna, S. R. M.

arXiv.org Artificial IntelligenceJun-5-2025

This study explores the use of self-supervised learning (SSL) models for tone recognition in three low-resource languages from North Eastern India: Angami, Ao, and Mizo. We evaluate four Wav2vec2.0 base models that were pre-trained on both tonal and non-tonal languages. We analyze tone-wise performance across the layers for all three languages and compare the different models. Our results show that tone recognition works best for Mizo and worst for Angami. The middle layers of the SSL models are the most important for tone recognition, regardless of the pre-training language, i.e. tonal or non-tonal. We have also found that the tone inventory, tone types, and dialectal variations affect tone recognition. These findings provide useful insights into the strengths and weaknesses of SSL-based embeddings for tonal languages and highlight the potential for improving tone recognition in low-resource settings. The source code is available at GitHub 1 .

machine learning, natural language, recognition, (21 more...)

arXiv.org Artificial Intelligence

2506.03606

Country:

Europe (1.00)
Asia > India > Nagaland (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback

Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition

Xu, Hainan, Chen, Zhehuai, Jia, Fei, Ginsburg, Boris

arXiv.org Artificial IntelligenceApr-4-2024

This paper proposes Transducers with Pronunciation-aware Embeddings (PET). Unlike conventional Transducers where the decoder embeddings for different tokens are trained independently, the PET model's decoder embedding incorporates shared components for text tokens with the same or similar pronunciations. With experiments conducted in multiple datasets in Mandarin Chinese and Korean, we show that PET models consistently improve speech recognition accuracy compared to conventional Transducers. Our investigation also uncovers a phenomenon that we call error chain reactions. Instead of recognition errors being evenly spread throughout an utterance, they tend to group together, with subsequent errors often following earlier ones. Our analysis shows that PET models effectively mitigate this issue by substantially reducing the likelihood of the model generating additional errors following a prior one. Our implementation will be open-sourced with the NeMo toolkit.

decoder, pronunciation, transducer, (14 more...)

arXiv.org Artificial Intelligence

2404.04295

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback

Seamless: Multilingual Expressive and Streaming Speech Translation

Communication, Seamless, Barrault, Loïc, Chung, Yu-An, Meglioli, Mariano Coria, Dale, David, Dong, Ning, Duppenthaler, Mark, Duquenne, Paul-Ambroise, Ellis, Brian, Elsahar, Hady, Haaheim, Justin, Hoffman, John, Hwang, Min-Jae, Inaguma, Hirofumi, Klaiber, Christopher, Kulikov, Ilia, Li, Pengwei, Licht, Daniel, Maillard, Jean, Mavlyutov, Ruslan, Rakotoarison, Alice, Sadagopan, Kaushik Ram, Ramakrishnan, Abinesh, Tran, Tuan, Wenzek, Guillaume, Yang, Yilin, Ye, Ethan, Evtimov, Ivan, Fernandez, Pierre, Gao, Cynthia, Hansanti, Prangthip, Kalbassi, Elahe, Kallet, Amanda, Kozhevnikov, Artyom, Gonzalez, Gabriel Mejia, Roman, Robin San, Touret, Christophe, Wong, Corinne, Wood, Carleigh, Yu, Bokai, Andrews, Pierre, Balioglu, Can, Chen, Peng-Jen, Costa-jussà, Marta R., Elbayad, Maha, Gong, Hongyu, Guzmán, Francisco, Heffernan, Kevin, Jain, Somya, Kao, Justine, Lee, Ann, Ma, Xutai, Mourachko, Alex, Peloquin, Benjamin, Pino, Juan, Popuri, Sravya, Ropers, Christophe, Saleem, Safiyyah, Schwenk, Holger, Sun, Anna, Tomasello, Paden, Wang, Changhan, Wang, Jeff, Wang, Skyler, Williamson, Mary

arXiv.org Artificial IntelligenceDec-8-2023

Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4T model-SeamlessM4T v2. This newer model, incorporating an updated UnitY2 framework, was trained on more low-resource language data. SeamlessM4T v2 provides the foundation on which our next two models are initiated. SeamlessExpressive enables translation that preserves vocal styles and prosody. Compared to previous efforts in expressive speech research, our work addresses certain underexplored aspects of prosody, such as speech rate and pauses, while also preserving the style of one's voice. As for SeamlessStreaming, our model leverages the Efficient Monotonic Multihead Attention mechanism to generate low-latency target translations without waiting for complete source utterances. As the first of its kind, SeamlessStreaming enables simultaneous speech-to-speech/text translation for multiple source and target languages. To ensure that our models can be used safely and responsibly, we implemented the first known red-teaming effort for multimodal machine translation, a system for the detection and mitigation of added toxicity, a systematic evaluation of gender bias, and an inaudible localized watermarking mechanism designed to dampen the impact of deepfakes. Consequently, we bring major components from SeamlessExpressive and SeamlessStreaming together to form Seamless, the first publicly available system that unlocks expressive cross-lingual communication in real-time. The contributions to this work are publicly released and accessible at https://github.com/facebookresearch/seamless_communication

detection and mitigation, readiness and information access, seamlessm4t-large system, (16 more...)

arXiv.org Artificial Intelligence

2312.05187

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.13)
Europe > Italy > Tuscany > Florence (0.04)
North America > Canada > Ontario > Toronto (0.04)
(19 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.92)

Add feedback

The Impact of Familiarity on Naming Variation: A Study on Object Naming in Mandarin Chinese

He, Yunke, Liao, Xixian, Liang, Jialing, Boleda, Gemma

arXiv.org Artificial IntelligenceNov-16-2023

Different speakers often produce different names for the same object or entity (e.g., "woman" vs. "tourist" for a female tourist). The reasons behind variation in naming are not well understood. We create a Language and Vision dataset for Mandarin Chinese that provides an average of 20 names for 1319 naturalistic images, and investigate how familiarity with a given kind of object relates to the degree of naming variation it triggers across subjects. We propose that familiarity influences naming variation in two competing ways: increasing familiarity can either expand vocabulary, leading to higher variation, or promote convergence on conventional names, thereby reducing variation. We find evidence for both factors being at play. Our study illustrates how computational resources can be used to address research questions in Cognitive Science.

familiarity, mandarin chinese, variation, (16 more...)

arXiv.org Artificial Intelligence

2311.10181

Country:

Europe > Austria > Vienna (0.14)
Asia > China (0.04)
North America > United States > New York (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Educational Setting > Higher Education (0.46)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.88)
(2 more...)

Add feedback

Added Toxicity Mitigation at Inference Time for Multimodal and Massively Multilingual Translation

Costa-jussà, Marta R., Dale, David, Elbayad, Maha, Yu, Bokai

arXiv.org Artificial IntelligenceNov-11-2023

Added toxicity in the context of translation refers to the fact of producing a translation output with more toxicity than there exists in the input. In this paper, we present MinTox which is a novel pipeline to identify added toxicity and mitigate this issue which works at inference time. MinTox uses a toxicity detection classifier which is multimodal (speech and text) and works in languages at scale. The mitigation method is applied to languages at scale and directly in text outputs. MinTox is applied to SEAMLESSM4T, which is the latest multimodal and massively multilingual machine translation system. For this system, MinTox achieves significant added toxicity mitigation across domains, modalities and language directions. MinTox manages to approximately filter out from 25% to 95% of added toxicity (depending on the modality and domain) while keeping translation quality.

mintox, toxicity, translation, (17 more...)

arXiv.org Artificial Intelligence

2311.06532

Country:

Africa > Tanzania (0.05)
Africa > Kenya (0.05)
North America > United States > Pennsylvania (0.04)
(5 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone Disambiguation

Qiang, Chunyu, Yang, Peng, Che, Hao, Xiao, Jinba, Wang, Xiaorui, Wang, Zhongyuan

arXiv.org Artificial IntelligenceNov-17-2022

Conversion of Chinese Grapheme-to-Phoneme (G2P) plays an important role in Mandarin Chinese Text-To-Speech (TTS) systems, where one of the biggest challenges is the task of polyphone disambiguation. Most of the previous polyphone disambiguation models are trained on manually annotated datasets, and publicly available datasets for polyphone disambiguation are scarce. In this paper we propose a simple back-translation-style data augmentation method for mandarin Chinese polyphone disambiguation, utilizing a large amount of unlabeled text data. Inspired by the back-translation technique proposed in the field of machine translation, we build a Grapheme-to-Phoneme (G2P) model to predict the pronunciation of polyphonic character, and a Phoneme-to-Grapheme (P2G) model to predict pronunciation into text. Meanwhile, a window-based matching strategy and a multi-model scoring strategy are proposed to judge the correctness of the pseudo-label. We design a data balance strategy to improve the accuracy of some typical polyphonic characters in the training set with imbalanced distribution or data scarcity. The experimental result shows the effectiveness of the proposed back-translation-style data augmentation method.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2211.09495

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Chinese Learners' Phonetic Transfer of /i/ from Mandarin Chinese to General American English: A Case Study of a Chinese Learner with Advanced English

Chen, Lintao

arXiv.org Artificial IntelligenceOct-20-2022

The current paper concerns language transfer at the phonetic level and concentrates on the transfer phenomenon in an advanced English language learner's acquisition of the English vowels /i/ and its lax counterpart. By determining whether the Chinese English-language learner (ELL), named Vanya, can accurately distinguish between /i/ and its lax counterpart, and pronounce them precisely in General American English (GAE), this paper serves as a reference for further studying language transfer among Chinese ELLs. There were two objectives: first, the learner's perceptual ability to distinguish between vowels /i/ and its lax counterpart was examined; second, the effect of the phonetic transfer was determined. Two perception tests and a production test were used to attain these two objectives. The results of two perception tests demonstrated Vanya's perceptual competence in distinguishing between /i/ and its lax counterpart and laid a solid foundation for the validity of the subsequent production test. Given that Vanya's production of F1 and F2 values of /i/ were highly similar across his first language (Mandarin Chinese) and second language (GAE) and that both values were lower than the typical values for common /i/ in GAE, with an especially prominent disparity between the F2 values, it is reasonable to conclude that a phonetic transfer occurred. The participant's high perceptual competence as an advanced-level ELL did not noticeably moderate the effect of phonetic transfer.

artificial intelligence, phonetic transfer, vanya, (15 more...)

arXiv.org Artificial Intelligence

2112.13571

Country:

Asia > China (0.05)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence (0.49)

Add feedback

Language: People across cultures agree the word 'bouba' sounds round while 'kiki' sounds pointy

Daily Mail - Science & techNov-17-2021, 13:23:43 GMT

From English to Chinese, Hungarian and Zulu, people who speak different languages make the same links between sounds and shapes, a new study shows. An international research team has conducted the largest cross-cultural test of the'bouba-kiki effect' – the tendency to associate made-up words'bouba' with a round shape and'kiki' with a spiky shape. The researchers surveyed 917 speakers of 25 different languages representing nine language families and 10 writing systems. They found the effect exists independently of the language that a person speaks or the writing system that they use, whether it's the Roman alphabet (A, B, C), the Greek alphabet (alpha, beta, gamma) or Chinese characters (北, 方, 话). Such universally-meaningful vocalisations may form a global basis for the creation of new words, such as terms that circulate on social media. Bouba and kiki shapes used in the experiment.

alphabet, bouba-kiki effect, made-up word, (15 more...)

Daily Mail - Science & tech

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence (0.39)

Add feedback

Visually Grounded Reasoning across Languages and Cultures

Liu, Fangyu, Bugliarello, Emanuele, Ponti, Edoardo Maria, Reddy, Siva, Collier, Nigel, Elliott, Desmond

arXiv.org Artificial IntelligenceOct-21-2021

The design of widespread vision-and-language datasets and pre-trained encoders directly adopts, or draws inspiration from, the concepts and images of ImageNet. While one can hardly overestimate how much this benchmark contributed to progress in computer vision, it is mostly derived from lexical databases and image queries in English, resulting in source material with a North American or Western European bias. Therefore, we devise a new protocol to construct an ImageNet-style hierarchy representative of more languages and cultures. In particular, we let the selection of both concepts and images be entirely driven by native speakers, rather than scraping them automatically. Specifically, we focus on a typologically diverse set of languages, namely, Indonesian, Mandarin Chinese, Swahili, Tamil, and Turkish. On top of the concepts and images obtained through this new protocol, we create a multilingual dataset for {M}ulticultur{a}l {R}easoning over {V}ision and {L}anguage (MaRVL) by eliciting statements from native speaker annotators about pairs of images. The task consists of discriminating whether each grounded statement is true or false. We establish a series of baselines using state-of-the-art models and find that their cross-lingual transfer performance lags dramatically behind supervised performance in English. These results invite us to reassess the robustness and accuracy of current state-of-the-art models beyond a narrow domain, but also open up new exciting challenges for the development of truly multilingual and multicultural systems.

annotator, computational linguistic, dataset, (16 more...)

arXiv.org Artificial Intelligence

2109.13238

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(34 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(3 more...)

Add feedback