mandarin
- North America > United States (0.04)
- Asia > Singapore (0.04)
Democratic or Authoritarian? Probing a New Dimension of Political Biases in Large Language Models
Piedrahita, David Guzman, Strauss, Irene, Schölkopf, Bernhard, Mihalcea, Rada, Jin, Zhijing
As Large Language Models (LLMs) become increasingly integrated into everyday life and information ecosystems, concerns about their implicit biases continue to persist. While prior work has primarily examined socio-demographic and left--right political dimensions, little attention has been paid to how LLMs align with broader geopolitical value systems, particularly the democracy--authoritarianism spectrum. In this paper, we propose a novel methodology to assess such alignment, combining (1) the F-scale, a psychometric tool for measuring authoritarian tendencies, (2) FavScore, a newly introduced metric for evaluating model favorability toward world leaders, and (3) role-model probing to assess which figures are cited as general role-models by LLMs. We find that LLMs generally favor democratic values and leaders, but exhibit increased favorability toward authoritarian figures when prompted in Mandarin. Further, models are found to often cite authoritarian figures as role models, even outside explicit political contexts. These results shed light on ways LLMs may reflect and potentially reinforce global political ideologies, highlighting the importance of evaluating bias beyond conventional socio-political axes. Our code is available at: https://github.com/irenestrauss/Democratic-Authoritarian-Bias-LLMs.
- North America > Cuba (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- Asia > Middle East > Syria (0.14)
- (185 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Questionnaire & Opinion Survey (1.00)
- Law (0.67)
- Government > Regional Government > Asia Government > Middle East Government (0.46)
EM2LDL: A Multilingual Speech Corpus for Mixed Emotion Recognition through Label Distribution Learning
Li, Xingfeng, Shi, Xiaohan, Li, Junjie, Li, Yongwei, Unoki, Masashi, Toda, Tomoki, Akagi, Masato
This study introduces EM2LDL, a novel multilingual speech corpus designed to advance mixed emotion recognition through label distribution learning. Addressing the limitations of predominantly monolingual and single-label emotion corpora \textcolor{black}{that restrict linguistic diversity, are unable to model mixed emotions, and lack ecological validity}, EM2LDL comprises expressive utterances in English, Mandarin, and Cantonese, capturing the intra-utterance code-switching prevalent in multilingual regions like Hong Kong and Macao. The corpus integrates spontaneous emotional expressions from online platforms, annotated with fine-grained emotion distributions across 32 categories. Experimental baselines using self-supervised learning models demonstrate robust performance in speaker-independent gender-, age-, and personality-based evaluations, with HuBERT-large-EN achieving optimal results. By incorporating linguistic diversity and ecological validity, EM2LDL enables the exploration of complex emotional dynamics in multilingual settings. This work provides a versatile testbed for developing adaptive, empathetic systems for applications in affective computing, including mental health monitoring and cross-cultural communication. The dataset, annotations, and baseline codes are publicly available at https://github.com/xingfengli/EM2LDL.
- Asia > Macao (0.34)
- Asia > China > Hong Kong (0.25)
- Oceania > Australia > Australian Capital Territory > Canberra (0.05)
- (8 more...)
- Overview (0.93)
- Research Report > New Finding (0.93)
\textsc{CantoNLU}: A benchmark for Cantonese natural language understanding
Min, Junghyun, Ng, York Hay, Chan, Sophia, Zhao, Helena Shunhua, Lee, En-Shiun Annie
Cantonese, although spoken by millions, remains under-resourced due to policy and diglossia. To address this scarcity of evaluation frameworks for Cantonese, we introduce \textsc{\textbf{CantoNLU}}, a benchmark for Cantonese natural language understanding (NLU). This novel benchmark spans seven tasks covering syntax and semantics, including word sense disambiguation, linguistic acceptability judgment, language detection, natural language inference, sentiment analysis, part-of-speech tagging, and dependency parsing. In addition to the benchmark, we provide model baseline performance across a set of models: a Mandarin model without Cantonese training, two Cantonese-adapted models obtained by continual pre-training a Mandarin model on Cantonese text, and a monolingual Cantonese model trained from scratch. Results show that Cantonese-adapted models perform best overall, while monolingual models perform better on syntactic tasks. Mandarin models remain competitive in certain settings, indicating that direct transfer may be sufficient when Cantonese domain data is scarce. We release all datasets, code, and model weights to facilitate future research in Cantonese NLP.
- North America > Canada > Ontario > Toronto (0.86)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- (22 more...)
"Be My Cheese?": Assessing Cultural Nuance in Multilingual LLM Translations
Van Doren, Madison, Holland, Cory
This pilot study explores the localisation capabilities of state-of-the-art multilingual AI models when translating figurative language, such as idioms and puns, from English into a diverse range of global languages. It expands on existing LLM translation research and industry benchmarks, which emphasise grammatical accuracy and token-level correctness, by focusing on cultural appropriateness and overall localisation quality - critical factors for real-world applications like marketing and e-commerce. To investigate these challenges, this project evaluated a sample of 87 LLM-generated translations of e-commerce marketing emails across 24 regional dialects of 20 languages. Human reviewers fluent in each target language provided quantitative ratings and qualitative feedback on faithfulness to the original's tone, meaning, and intended audience. Findings suggest that, while leading models generally produce grammatically correct translations, culturally nuanced language remains a clear area for improvement, often requiring substantial human refinement. Notably, even high-resource global languages, despite topping industry benchmark leaderboards, frequently mistranslated figurative expressions and wordplay. This work challenges the assumption that data volume is the most reliable predictor of machine translation quality and introduces cultural appropriateness as a key determinant of multilingual LLM performance - an area currently underexplored in existing academic and industry benchmarks. As a proof of concept, this pilot highlights limitations of current multilingual AI systems for real-world localisation use cases. Results of this pilot support the opportunity for expanded research at greater scale to deliver generalisable insights and inform deployment of reliable machine translation workflows in culturally diverse contexts.
- Europe > United Kingdom (0.14)
- Asia > India (0.06)
- South America > Brazil (0.04)
- (20 more...)
Towards Unsupervised Speech Recognition at the Syllable-Level
Wang, Liming, Ni, Junrui, Chang, Kai-Wei, Bhati, Saurabhchand, Harwath, David, Hasegawa-Johnson, Mark, Glass, James R.
Training speech recognizers with unpaired speech and text -- known as unsupervised speech recognition (UASR) -- is a crucial step toward extending ASR to low-resource languages in the long-tail distribution and enabling multimodal learning from non-parallel data. However, existing approaches based on phones often rely on costly resources such as grapheme-to-phoneme converters (G2Ps) and struggle to generalize to languages with ambiguous phoneme boundaries due to training instability. In this paper, we address both challenges by introducing a syllable-level UASR framework based on masked language modeling, which avoids the need for G2P and the instability of GAN-based methods. Our approach achieves up to a 40\% relative reduction in character error rate (CER) on LibriSpeech and generalizes effectively to Mandarin, a language that has remained particularly difficult for prior methods. Code will be released upon acceptance.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > British Columbia > Vancouver (0.04)
- Asia > South Korea > Incheon > Incheon (0.04)
- (10 more...)
Benchmarking Diarization Models
Lanzendörfer, Luca A., Grötschla, Florian, Blaser, Cesare, Wattenhofer, Roger
Speaker diarization is the task of partitioning audio into segments according to speaker identity, answering the question of "who spoke when" in multi-speaker conversation recordings. While diarization is an essential task for many downstream applications, it remains an unsolved problem. Errors in diarization propagate to downstream systems and cause wide-ranging failures. To this end, we examine exact failure modes by evaluating five state-of-the-art diarization models, across four diarization datasets spanning multiple languages and acoustic conditions. The evaluation datasets consist of 196.6 hours of multilingual audio, including English, Mandarin, German, Japanese, and Spanish. Overall, we find that PyannoteAI achieves the best performance at 11.2% DER, while DiariZen provides a competitive open-source alternative at 13.3% DER. When analyzing failure cases, we find that the primary cause of diarization errors stem from missed speech segments followed by speaker confusion, especially in high-speaker count settings.
Where to Go to Get Serious About Learning a Language: Lingoda, Preply, Fluenz
To really speak and understand a new language, you need to interact with humans. All products featured on WIRED are independently selected by our editors. However, we may receive compensation from retailers and/or from purchases of products through these links. Language learning apps like Duolingo are useful, but they have their limits. They're ideal for getting started with a new language, beefing up vocabulary, practicing skills, and even having fun playing the built-in games.
- South America > Ecuador > Pichincha Province > Quito (0.04)
- South America > Colombia > Bogotá D.C. > Bogotá (0.04)
- North America > United States > California (0.04)
- (6 more...)
SiniticMTError: A Machine Translation Dataset with Error Annotations for Sinitic Languages
Liu, Hannah, Min, Junghyun, Cheung, Ethan Yue Heng, Hung, Shou-Yi, Wasti, Syed Mekael, Liang, Runtong, Qian, Shiyao, Zheng, Shizhao, Chan, Elsie, Lo, Ka Ieng Charlotte, Yip, Wing Yu, Tsai, Richard Tzong-Han, Lee, En-Shiun Annie
Despite major advances in machine translation (MT) in recent years, progress remains limited for many low-resource languages that lack large-scale training data and linguistic resources. Cantonese and Wu Chinese are two Sinitic examples, although each enjoys more than 80 million speakers around the world. In this paper, we introduce SINITICMTER-ROR, a novel dataset that builds on existing parallel corpora to provide error span, error type, and error severity annotations in machine-translated examples from English to Mandarin, Cantonese, and Wu Chinese. Our dataset serves as a resource for the MT community to utilize in fine-tuning models with error detection capabilities, supporting research on translation quality estimation, error-aware generation, and low-resource language evaluation. We report our rigorous annotation process by native speakers, with analyses on inter-annotator agreement, iterative feedback, and patterns in error type and severity.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.05)
- (23 more...)
UMA-Split: unimodal aggregation for both English and Mandarin non-autoregressive speech recognition
ABSTRACT This paper proposes a unimodal aggregation (UMA) based non-autoregressive model for both English and Mandarin speech recognition. The original UMA explicitly segments and aggregates acoustic frames (with unimodal weights that first monotonically increase and then decrease) of the same text token to learn better representations than regular connectionist temporal classification (CTC). However, it only works well in Mandarin. It struggles with other languages, such as English, for which a single syllable may be tokenized into multiple fine-grained tokens, or a token spans fewer than 3 acoustic frames and fails to form unimodal weights. To address this problem, we propose allowing each UMA-aggregated frame map to multiple tokens, via a simple split module that generates two tokens from each aggregated frame before computing the CTC loss.
- Asia > China (0.41)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)