asian language
Camellia: Benchmarking Cultural Biases in LLMs for Asian Languages
Naous, Tarek, Savit, Anagha, Catalan, Carlos Rafael, Guo, Geyang, Lee, Jaehyeok, Lee, Kyungdon, Dizon, Lheane Marie, Ye, Mengyu, Kothari, Neel, Singh, Sahajpreet, Masud, Sarah, Patwa, Tanish, Tran, Trung Thanh, Khan, Zohaib, Ritter, Alan, Bak, JinYeong, Sakaguchi, Keisuke, Chakraborty, Tanmoy, Arase, Yuki, Xu, Wei
As Large Language Models (LLMs) gain stronger multilingual capabilities, their ability to handle culturally diverse entities becomes crucial. Prior work has shown that LLMs often favor Western-associated entities in Arabic, raising concerns about cultural fairness. Due to the lack of multilingual benchmarks, it remains unclear if such biases also manifest in different non-Western languages. In this paper, we introduce Camellia, a benchmark for measuring entity-centric cultural biases in nine Asian languages spanning six distinct Asian cultures. Camellia includes 19,530 entities manually annotated for association with the specific Asian or Western culture, as well as 2,173 naturally occurring masked contexts for entities derived from social media posts. Using Camellia, we evaluate cultural biases in four recent multilingual LLM families across various tasks such as cultural context adaptation, sentiment association, and entity extractive QA. Our analyses show a struggle by LLMs at cultural adaptation in all Asian languages, with performance differing across models developed in regions with varying access to culturally-relevant data. We further observe that different LLM families hold their distinct biases, differing in how they associate cultures with particular sentiments. Lastly, we find that LLMs struggle with context understanding in Asian languages, creating performance gaps between cultures in entity extraction. Large Language Models (LLMs) have rapidly integrated into modern technology, serving users from diverse cultures (Adilazuarda et al., 2024). Among the vast range of text they process, LLMs frequently encounter entities such as people's names, locations, or food dishes, which are pervasive in text corpora (Wolfe & Caliskan, 2021; Pawar et al., 2025a) and often appear in user prompts (Li et al., 2024a; Wang et al., 2025). Importantly, entities carry cultural associations, making it essential for LLMs to handle culturally diverse entities fairly.
Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training Alignment
Xhelili, Orgest, Liu, Yihong, Schütze, Hinrich
Multilingual pre-trained models (mPLMs) have shown impressive performance on cross-lingual transfer tasks. However, the transfer performance is often hindered when a low-resource target language is written in a different script than the high-resource source language, even though the two languages may be related or share parts of their vocabularies. Inspired by recent work that uses transliteration to address this problem, our paper proposes a transliteration-based post-pretraining alignment (PPA) method aiming to improve the cross-lingual alignment between languages using diverse scripts. We select two areal language groups, $\textbf{Mediterranean-Amharic-Farsi}$ and $\textbf{South+East Asian Languages}$, wherein the languages are mutually influenced but use different scripts. We apply our method to these language groups and conduct extensive experiments on a spectrum of downstream tasks. The results show that after PPA, models consistently outperform the original model (up to 50% for some tasks) in English-centric transfer. In addition, when we use languages other than English as sources in transfer, our method obtains even larger improvements. We will make our code and models publicly available at \url{https://github.com/cisnlp/Transliteration-PPA}.
Cloud-based Automatic Speech Recognition Systems for Southeast Asian Languages
Wang, Lei, Tong, Rong, Leung, Cheung Chi, Sivadas, Sunil, Ni, Chongjia, Ma, Bin
This paper provides an overall introduction of our Automatic Speech Recognition (ASR) systems for Southeast Asian languages. As not much existing work has been carried out on such regional languages, a few difficulties should be addressed before building the systems: limitation on speech and text resources, lack of linguistic knowledge, etc. This work takes Bahasa Indonesia and Thai as examples to illustrate the strategies of collecting various resources required for building ASR systems.
AI Singapore
AI Singapore (AISG) was launched in June 2017 as an integrated, impact-driven, research and innovation program in artificial intelligence (AI) for the entire country. As a national initiative, AISG brings together the strength of Singaporean research bodies in Singapore's Autonomous Universities (AUs) and research institutes, together with the vibrant ecosystem of AI start-ups and companies developing AI products, to perform use-inspired research, create innovative AI solution, and develop the talent to power Singapore's AI efforts. To achieve Singapore's national mission, AISG's activities are anchored around four key pillars: An organization can propose a problem statement where no commercial-off-the-shelf AI solution exists, but can potentially be solved through AISG's ecosystem of researchers and research IPs within nine to 18 months. AISG will assemble a team of AI researchers and engineers from Singapore's research and development ecosystem to work on an organization's problem statement. Through a collaborative process, a company's existing technical manpower will work alongside a team of AI researchers and engineering assembled by AISG to develop AI solutions while helping the company build up its internal AI capabilities.
Expedia, AI Singapore join forces on AI to improve online searches for Asian travellers TTG Asia
Expedia Group has announced a collaboration with AI Singapore (AISG) – an inter-agency unit tasked to catalyse and grow the country's artificial intelligence (AI) capabilities – under its flagship 100 Experiments (100E) programme to develop an AI solution to transform the online search experience for Asian travellers. The first online travel platform to collaborate with AISG for 100E, Expedia Group will provide a team of experienced engineers, data scientists and marketers to work with the AISG's project lead, project managers and AI apprentices to enhance travel search query understanding and improve the accuracy of search query resolution in Asian languages. Today's search engines are efficient in understanding travel search queries and providing query resolutions in English, as English is the dominant language used online by 25 per cent of all Internet users. However, when dealing with travel search queries conducted in Asian languages such as Japanese, Korean, simplified Chinese and traditional Chinese, the performance of the search engines declines significantly and the accuracy of query resolution dips. For a start, the Expedia Group and AI Singapore project team will leverage natural language processing and machine learning to develop an AI-based model to enhance search query understanding and resolution in the Japanese language, before extending the model to other Asian languages to enhance online search efficiency.
Google Translate Vs. Papago: In Asia's Battle Of Translation Apps, Everyone's A Loser
Opinions expressed by Forbes Contributors are their own. The author is a Forbes contributor. The opinions expressed are those of the writer. A woman uses a translation app on her smartphone in Paris on Nov. 4, 2014. If you have ever traveled through a foreign country armed with little more than Google Translate to communicate, you know how awkward it can be to use the app to ask a friendly local for directions to the zoo only to unwittingly end up insulting his sister.