Goto

Collaborating Authors

 transcreation


One-Topic-Doesn't-Fit-All: Transcreating Reading Comprehension Test for Personalized Learning

Han, Jieun, Lee, Daniel, Yoo, Haneul, Yoon, Jinsung, Park, Junyeong, Kim, Suin, Ahn, So-Yeon, Oh, Alice

arXiv.org Artificial Intelligence

Personalized learning has gained attention in English as a Foreign Language (EFL) education, where engagement and motivation play crucial roles in reading comprehension. We propose a novel approach to generating personalized English reading comprehension tests tailored to students' interests. We develop a structured content transcreation pipeline using OpenAI's gpt-4o, where we start with the RACE-C dataset, and generate new passages and multiple-choice reading comprehension questions that are linguistically similar to the original passages but semantically aligned with individual learners' interests. Our methodology integrates topic extraction, question classification based on Bloom's taxonomy, linguistic feature analysis, and content transcreation to enhance student engagement. We conduct a controlled experiment with EFL learners in South Korea to examine the impact of interest-aligned reading materials on comprehension and motivation. Our results show students learning with personalized reading passages demonstrate improved comprehension and motivation retention compared to those learning with non-personalized materials.


LinguaSafe: A Comprehensive Multilingual Safety Benchmark for Large Language Models

Ning, Zhiyuan, Gu, Tianle, Song, Jiaxin, Hong, Shixin, Li, Lingyu, Liu, Huacan, Li, Jie, Wang, Yixu, Lingyu, Meng, Teng, Yan, Wang, Yingchun

arXiv.org Artificial Intelligence

The widespread adoption and increasing prominence of large language models (LLMs) in global technologies necessitate a rigorous focus on ensuring their safety across a diverse range of linguistic and cultural contexts. The lack of a comprehensive evaluation and diverse data in existing multilingual safety evaluations for LLMs limits their effectiveness, hindering the development of robust multilingual safety alignment. To address this critical gap, we introduce LinguaSafe, a comprehensive multilingual safety benchmark crafted with meticulous attention to linguistic authenticity. The LinguaSafe dataset comprises 45k entries in 12 languages, ranging from Hungarian to Malay. Curated using a combination of translated, transcreated, and natively-sourced data, our dataset addresses the critical need for multilingual safety evaluations of LLMs, filling the void in the safety evaluation of LLMs across diverse under-represented languages from Hungarian to Malay. LinguaSafe presents a multidimensional and fine-grained evaluation framework, with direct and indirect safety assessments, including further evaluations for oversensitivity. The results of safety and helpfulness evaluations vary significantly across different domains and different languages, even in languages with similar resource levels. Our benchmark provides a comprehensive suite of metrics for in-depth safety evaluation, underscoring the critical importance of thoroughly assessing multilingual safety in LLMs to achieve more balanced safety alignment. Our dataset and code are released to the public to facilitate further research in the field of multilingual LLM safety.


Towards Automatic Evaluation for Image Transcreation

Khanuja, Simran, Iyer, Vivek, He, Claire, Neubig, Graham

arXiv.org Artificial Intelligence

Beyond conventional paradigms of translating speech and text, recently, there has been interest in automated transcreation of images to facilitate localization of visual content across different cultures. Attempts to define this as a formal Machine Learning (ML) problem have been impeded by the lack of automatic evaluation mechanisms, with previous work relying solely on human evaluation. In this paper, we seek to close this gap by proposing a suite of automatic evaluation metrics inspired by machine translation (MT) metrics, categorized into: a) Object-based, b) Embedding-based, and c) VLM-based. Drawing on theories from translation studies and real-world transcreation practices, we identify three critical dimensions of image transcreation: cultural relevance, semantic equivalence and visual similarity, and design our metrics to evaluate systems along these axes. Our results show that proprietary VLMs best identify cultural relevance and semantic equivalence, while vision-encoder representations are adept at measuring visual similarity. Meta-evaluation across 7 countries shows our metrics agree strongly with human ratings, with average segment-level correlations ranging from 0.55-0.87. Finally, through a discussion of the merits and demerits of each metric, we offer a robust framework for automated image transcreation evaluation, grounded in both theoretical foundations and practical application. Our code can be found here: https://github.com/simran-khanuja/automatic-eval-transcreation


An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance

Khanuja, Simran, Ramamoorthy, Sathyanarayanan, Song, Yueqi, Neubig, Graham

arXiv.org Artificial Intelligence

Given the rise of multimedia content, human translators increasingly focus on culturally adapting not only words but also other modalities such as images to convey the same meaning. While several applications stand to benefit from this, machine translation systems remain confined to dealing with language in speech and text. In this work, we take a first step towards translating images to make them culturally relevant. First, we build three pipelines comprising state-of-the-art generative models to do the task. Next, we build a two-part evaluation dataset: i) concept: comprising 600 images that are cross-culturally coherent, focusing on a single concept per image, and ii) application: comprising 100 images curated from real-world applications. We conduct a multi-faceted human evaluation of translated images to assess for cultural relevance and meaning preservation. We find that as of today, image-editing models fail at this task, but can be improved by leveraging LLMs and retrievers in the loop. Best pipelines can only translate 5% of images for some countries in the easier concept dataset and no translation is successful for some countries in the application dataset, highlighting the challenging nature of the task. Our code and data is released here: https://github.com/simran-khanuja/image-transcreation.


--Linguistic-fying-- Your Bot to Chat Responsively with Multilingual Audiences

#artificialintelligence

Chatbots - love them or hate them... they re here to stay. With today s techy-influence customers wanting fast and prompt answers, businesses have to respond fast too, or lose a potential deal to a competitor. Hoteliers, tourism and hospitality establishments are introducing online Chatbots to help increase the level of efficiency for interacting with customers. Call it your virtual messenger, customer assistant or a front desk avatar. With the use of AI invading every facet of the business world, all of us will have to deal with bots one time or another. Sometimes they can be spot on, while other times our experience can range from the usual I m sorry, to something that s entirely not what we re looking for or even an utterly incompetent response.


Why machine learning will impact, but not take, your job Information Age

#artificialintelligence

Artificial intelligence is being used all around is, but it looks nothing like The Jetsons. So why are people panicked that robots will take their jobs? The World Economic Forum warned that robots and technological advances will take more than 5 million jobs from humans over the next five years. Machine learning has undoubtedly earned its place in the workforce, but machines don't necessarily have to replace humans – they can in fact enhance the work humans can do. One area where machine learning is flourishing is in the localisation and translation industry.