native speaker
Start your 2026 Resolutions with a lifetime membership to Rosetta Stone's language learning program for just 149
Grab a lifetime membership and learn up to 25 languages instead of scrolling your life away in 2026. We may earn revenue from the products available on this page and participate in affiliate programs. Learning a new language is a very common resolution. It's useful, stimulating, and can even be fun if you choose the right method. Right now, Rosetta Stone has discounted its lifetime memberships, which means you can pay once and learn forever.
- South America > Brazil (0.05)
- North America > Central America (0.05)
- Europe > Spain (0.05)
- Europe > Italy (0.05)
AdiBhashaa: A Community-Curated Benchmark for Machine Translation into Indian Tribal Languages
Large language models and multilingual machine translation (MT) systems increasingly drive access to information, yet many languages of the tribal communities remain effectively invisible in these technologies. This invisibility exacerbates existing structural inequities in education, governance, and digital participation. We present AdiBhashaa, a community-driven initiative that constructs the first open parallel corpora and baseline MT systems for four major Indian tribal languages-Bhili, Mundari, Gondi, and Santali. This work combines participatory data creation with native speakers, human-in-the-loop validation, and systematic evaluation of both encoder-decoder MT models and large language models. In addition to reporting technical findings, we articulate how AdiBhashaa illustrates a possible model for more equitable AI research: it centers local expertise, builds capacity among early-career researchers from marginalized communities, and foregrounds human validation in the development of language technologies.
- Asia > India > West Bengal (0.05)
- Asia > India > Madhya Pradesh (0.05)
- Asia > India > Jharkhand (0.05)
- (7 more...)
The author is dead, but what if they never lived? A reception experiment on Czech AI- and human-authored poetry
Marklová, Anna, Vinš, Ondřej, Vokáčová, Martina, Milička, Jiří
Large language models are increasingly capable of producing creative texts, yet most studies on AI-generated poetry focus on English -- a language that dominates training data. In this paper, we examine the perception of AI- and human-written Czech poetry. We ask if Czech native speakers are able to identify it and how they aesthetically judge it. Participants performed at chance level when guessing authorship (45.8\% correct on average), indicating that Czech AI-generated poems were largely indistinguishable from human-written ones. Aesthetic evaluations revealed a strong authorship bias: when participants believed a poem was AI-generated, they rated it as less favorably, even though AI poems were in fact rated equally or more favorably than human ones on average. The logistic regression model uncovered that the more the people liked a poem, the less probable was that they accurately assign the authorship. Familiarity with poetry or literary background had no effect on recognition accuracy. Our findings show that AI can convincingly produce poetry even in a morphologically complex, low-resource (with respect of the training data of AI models) Slavic language such as Czech. The results suggest that readers' beliefs about authorship and the aesthetic evaluation of the poem are interconnected.
- North America > United States > North Dakota > Billings County (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Europe > Czechia > Prague (0.04)
- Africa > Nigeria > Delta State > Abraka (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
MTQ-Eval: Multilingual Text Quality Evaluation for Language Models
Pokharel, Rhitabrat, Agrawal, Ameeta
The use of large language models (LLMs) for evaluating outputs is becoming an increasingly effective and scalable approach. However, it remains uncertain whether this capability extends beyond task-specific evaluations to more general assessments of text quality, particularly in multilingual contexts. In this study, we introduce, MTQ-Eval, a novel framework for multilingual text quality evaluation that learns from examples of both high- and low-quality texts, adjusting its internal representations. To develop MTQ-Eval, we first automatically generate text quality preference data and then use it to train open-source base LLMs to align with ratings of high- and low-quality text. Our comprehensive evaluation across 115 languages demonstrates the improved performance of the proposed model. Upon further analysis, we find that this enhanced evaluation capability also leads to notable improvements in downstream tasks.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.05)
- Asia > Singapore (0.04)
- (7 more...)
Digital linguists work to save the Arapaho language
New database built with over 100,000 Arapaho sentences. Breakthroughs, discoveries, and DIY tips sent every weekday. Arapaho is one of the many an Indigenous languages that are severely at risk of disappearing . According to the United Nations, 4 in 10 Indigenous languages are currently at risk due to aging populations and ongoing discrimination against native speakers. For Arapaho, the population of native speakers of the language rooted in the Great Plains is aging.
- North America > United States > Colorado > Boulder County > Boulder (0.05)
- Asia > Middle East > Jordan (0.05)
Dataset Creation and Baseline Models for Sexism Detection in Hausa
Muhammad, Fatima Adam, Hassan, Shamsuddeen Muhammad, Inuwa-Dutse, Isa
Sexism reinforces gender inequality and social exclusion by perpetuating stereotypes, bias, and discriminatory norms. Noting how online platforms enable various forms of sexism to thrive, there is a growing need for effective sexism detection and mitigation strategies. While computational approaches to sexism detection are widespread in high-resource languages, progress remains limited in low-resource languages where limited linguistic resources and cultural differences affect how sexism is expressed and perceived. This study introduces the first Hausa sexism detection dataset, developed through community engagement, qualitative coding, and data augmentation. For cultural nuances and linguistic representation, we conducted a two-stage user study (n=66) involving native speakers to explore how sexism is defined and articulated in everyday discourse. We further experiment with both traditional machine learning classifiers and pre-trained multilingual language models and evaluating the effectiveness few-shot learning in detecting sexism in Hausa. Our findings highlight challenges in capturing cultural nuance, particularly with clarification-seeking and idiomatic expressions, and reveal a tendency for many false positives in such cases.
- Africa > Nigeria > Jigawa State > Dutse (0.05)
- North America > United States > Virginia (0.04)
- Europe > United Kingdom > England > West Yorkshire > Huddersfield (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- Research Report > New Finding (0.49)
- Public Relations > Community Relations (0.34)
Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures
Chang, Tyler A., Arnett, Catherine, Eldesokey, Abdelrahman, Sadallah, Abdelrahman, Kashar, Abeer, Daud, Abolade, Olanihun, Abosede Grace, Mohammed, Adamu Labaran, Praise, Adeyemi, Sharma, Adhikarinayum Meerajita, Gupta, Aditi, Iyigun, Afitab, Simplício, Afonso, Essouaied, Ahmed, Chorana, Aicha, Eppa, Akhil, Oladipo, Akintunde, Ramesh, Akshay, Dorkin, Aleksei, Kondoro, Alfred Malengo, Aji, Alham Fikri, Çetintaş, Ali Eren, Hanbury, Allan, Dembele, Alou, Niksarli, Alp, Arroyo, Álvaro, Bajand, Amin, Khanna, Amol, Chkhaidze, Ana, Condez, Ana, Mkhonto, Andiswa, Hoblitzell, Andrew, Tran, Andrew, Poulis, Angelos, Majumder, Anirban, Vacalopoulou, Anna, Wong, Annette Kuuipolani Kanahele, Simonsen, Annika, Kovalev, Anton, S, Ashvanth., Lana, Ayodeji Joseph, Kinay, Barkin, Alhafni, Bashar, Busole, Benedict Cibalinda, Ghanem, Bernard, Nathani, Bharti, Đurić, Biljana Stojanovska, Agbonile, Bola, Bergsson, Bragi, Fischer, Bruce Torres, Tutar, Burak, Çınar, Burcu Alakuş, Kane, Cade J. Kanoniakapueo, Udomcharoenchaikit, Can, Arnett, Catherine, Helwe, Chadi, Nerella, Chaithra Reddy, Liu, Chen Cecilia, Nwokolo, Chiamaka Glory, España-Bonet, Cristina, Amol, Cynthia, Lee, DaeYeop, Arad, Dana, Dzenhaliou, Daniil, Pugacheva, Daria, Choi, Dasol, Abolade, Daud, Liu, David, Semedo, David, Popoola, Deborah, Mataciunas, Deividas, Nyaboke, Delphine, Kumar, Dhyuthy Krishna, Glória-Silva, Diogo, Tavares, Diogo, Goyal, Divyanshu, Lee, DongGeon, Anajemba, Ebele Nwamaka, Grace, Egonu Ngozi, Mickel, Elena, Tutubalina, Elena, Herranen, Elias, Anand, Emile, Habumuremyi, Emmanuel, Ajiboye, Emuobonuvie Maria, Yulianrifat, Eryawan Presma, Adenuga, Esther, Rudnicka, Ewa, Itiola, Faith Olabisi, Butt, Faran Taimoor, Thekkekara, Fathima, Haouari, Fatima, Tjiaranata, Filbert Aurelian, Laakom, Firas, Grasso, Francesca, Orabona, Francesco, Periti, Francesco, Solomon, Gbenga Kayode, Ngo, Gia Nghia, Udhehdhe-oze, Gloria, Martins, Gonçalo, Challagolla, Gopi Naga Sai Ram, Son, Guijin, Abdykadyrova, Gulnaz, Einarsson, Hafsteinn, Hu, Hai, Saffari, Hamidreza, Zaidi, Hamza, Zhang, Haopeng, Shairah, Harethah Abu, Vuong, Harry, Kuulmets, Hele-Andra, Bouamor, Houda, Yu, Hwanjo, Debess, Iben Nyholm, Deveci, İbrahim Ethem, Hanif, Ikhlasul Akmal, Cho, Ikhyun, Calvo, Inês, Vieira, Inês, Manzi, Isaac, Daud, Ismail, Itzhak, Itay, Iuliia, null, Alekseenko, null, Belashkin, Ivan, Spada, Ivan, Zhelyazkov, Ivan, Brinton, Jacob, Isbarov, Jafar, Čibej, Jaka, Čuhel, Jan, Kocoń, Jan, Krito, Jauza Akbar, Purbey, Jebish, Mickel, Jennifer, Za, Jennifer, Kunz, Jenny, Jeong, Jihae, Dávalos, Jimena Tena, Lee, Jinu, Magalhães, João, Yi, John, Kim, Jongin, Chataignon, Joseph, Imperial, Joseph Marvin, Thevakumar, Jubeerathan, Land, Judith, Jiang, Junchen, Kim, Jungwhan, Sirts, Kairit, R, Kamesh, V, Kamesh, Tshinu, Kanda Patrick, Kukk, Kätriin, Ponkshe, Kaustubh, Huseynova, Kavsar, He, Ke, Buchanan, Kelly, Sarveswaran, Kengatharaiyer, Zaman, Kerem, Mrini, Khalil, Kyars, Kian, Kruusmaa, Krister, Chouhan, Kusum, Krishnakumar, Lainitha, Sánchez, Laura Castro, Moscoso, Laura Porrino, Choshen, Leshem, Sencan, Levent, Øvrelid, Lilja, Alazraki, Lisa, Ehimen-Ugbede, Lovina, Thevakumar, Luheerathan, Thavarasa, Luxshan, Malik, Mahnoor, Keita, Mamadou K., Jangid, Mansi, De Santis, Marco, García, Marcos, Suppa, Marek, D'Ciofalo, Mariam, Ojastu, Marii, Sikander, Maryam, Narayan, Mausami, Skandalis, Maximos, Mehak, Mehak, Bozkurt, Mehmet İlteriş, Workie, Melaku Bayu, Velayuthan, Menan, Leventhal, Michael, Marcińczuk, Michał, Potočnjak, Mirna, Shafiei, Mohammadamin, Sharma, Mridul, Indoria, Mrityunjaya, Habibi, Muhammad Ravi Shulthan, Kolić, Murat, Galant, Nada, Permpredanun, Naphat, Maugin, Narada, Corrêa, Nicholas Kluge, Ljubešić, Nikola, Thomas, Nirmal, de Silva, Nisansa, Joshi, Nisheeth, Ponkshe, Nitish, Habash, Nizar, Udeze, Nneoma C., Thomas, Noel, Ligeti-Nagy, Noémi, Coulibaly, Nouhoum, Faustin, Nsengiyumva, Buliaminu, Odunayo Kareemat, Ogundepo, Odunayo, Fejiro, Oghojafor Godswill, Funmilola, Ogundipe Blessing, God'spraise, Okechukwu, Samuel, Olanrewaju, Oluwaseun, Olaoye Deborah, Akindejoye, Olasoji, Popova, Olga, Snissarenko, Olga, Chiemezie, Onyinye Anulika, Kinay, Orkun, Tursun, Osman, Moses, Owoeye Tobiloba, Joshua, Oyelade Oluwafemi, Fiyinfoluwa, Oyesanmi, Gamallo, Pablo, Fernández, Pablo Rodríguez, Arora, Palak, Valente, Pedro, Rupnik, Peter, Ekiugbo, Philip Oghenesuowho, Sahoo, Pramit, Prokopidis, Prokopis, Niau-Puhipau, Pua, Yahya, Quadri, Mignone, Rachele, Singhal, Raghav, Kadiyala, Ram Mohan Rao, Merx, Raphael, Afolayan, Rapheal, Rajalakshmi, Ratnavel, Ghosh, Rishav, Oji, Romina, Solis, Ron Kekeha, Guerra, Rui, Zawar, Rushikesh, Bashir, Sa'ad Nasir, Alzaabi, Saeed, Sandeep, Sahil, Batchu, Sai Pavan, Kantareddy, SaiSandeep, Pranida, Salsabila Zahirah, Buchanan, Sam, Rutunda, Samuel, Land, Sander, Sulollari, Sarah, Ali, Sardar, Sapkota, Saroj, Tautvaisas, Saulius, Sen, Sayambhu, Banerjee, Sayantani, Diarra, Sebastien, M, SenthilNathan., Lee, Sewoong, Shah, Shaan, Venkitachalam, Shankar, Djurabaeva, Sharifa, Ibejih, Sharon, Dutta, Shivanya Shomir, Gupta, Siddhant, Suárez, Silvia Paniagua, Ahmadi, Sina, Sukumar, Sivasuthan, Song, Siyuan, A., Snegha, Sofianopoulos, Sokratis, Simon, Sona Elza, Benčina, Sonja, Gvasalia, Sophie, More, Sphurti Kirit, Dragazis, Spyros, Kaufhold, Stephan P., S, Suba., AlRashed, Sultan, Ranathunga, Surangika, Someya, Taiga, Pungeršek, Taja Kuzman, Haklay, Tal, Jibril, Tasi'u, Aoyama, Tatsuya, Abashidze, Tea, Cruz, Terenz Jomar Dela, Blevins, Terra, Nikas, Themistoklis, Idoko, Theresa Dora, Do, Thu Mai, Chubakov, Tilek, Gargiani, Tommaso, Rathore, Uma, Johannesen, Uni, Ugwu, Uwuma Doris, Putra, Vallerie Alexandra, Kumar, Vanya Bannihatti, Jeyarajalingam, Varsha, Arzt, Varvara, Nedumpozhimana, Vasudevan, Ondrejova, Viktoria, Horbik, Viktoryia, Kummitha, Vishnu Vardhan Reddy, Dinić, Vuk, Sewunetie, Walelign Tewabe, Wu, Winston, Zhao, Xiaojing, Diarra, Yacouba, Nikankin, Yaniv, Mathur, Yash, Chen, Yixi, Li, Yiyuan, Xavier, Yolanda, Belinkov, Yonatan, Abayomi, Yusuf Ismail, Alyafeai, Zaid, Shan, Zhengyang, Tam, Zhi Rui, Tang, Zilu, Nadova, Zuzana, Abbasi, Baber, Biderman, Stella, Stap, David, Ataman, Duygu, Schmidt, Fabian, Gonen, Hila, Wang, Jiayi, Adelani, David Ifeoluwa
To date, there exist almost no culturally-specific evaluation benchmarks for large language models (LLMs) that cover a large number of languages and cultures. In this paper, we present Global PIQA, a participatory commonsense reasoning benchmark for over 100 languages, constructed by hand by 335 researchers from 65 countries around the world. The 116 language varieties in Global PIQA cover five continents, 14 language families, and 23 writing systems. In the non-parallel split of Global PIQA, over 50% of examples reference local foods, customs, traditions, or other culturally-specific elements. We find that state-of-the-art LLMs perform well on Global PIQA in aggregate, but they exhibit weaker performance in lower-resource languages (up to a 37% accuracy gap, despite random chance at 50%). Open models generally perform worse than proprietary models. Global PIQA highlights that in many languages and cultures, everyday knowledge remains an area for improvement, alongside more widely-discussed capabilities such as complex reasoning and expert knowledge. Beyond its uses for LLM evaluation, we hope that Global PIQA provides a glimpse into the wide diversity of cultures in which human language is embedded.
- Education > Educational Setting (0.67)
- Leisure & Entertainment > Sports (0.67)
- Government (0.67)
Multilingual Dialogue Generation and Localization with Dialogue Act Scripting
Vasselli, Justin, Kardinata, Eunike Andriani, Sakai, Yusuke, Watanabe, Taro
Non-English dialogue datasets are scarce, and models are often trained or evaluated on translations of English-language dialogues, an approach which can introduce artifacts that reduce their naturalness and cultural appropriateness. This work proposes Dialogue Act Script (DAS), a structured framework for encoding, localizing, and generating multilingual dialogues from abstract intent representations. Rather than translating dialogue utterances directly, DAS enables the generation of new dialogues in the target language that are culturally and contextually appropriate. By using structured dialogue act representations, DAS supports flexible localization across languages, mitigating translationese and enabling more fluent, naturalistic conversations. Human evaluations across Italian, German, and Chinese show that DAS-generated dialogues consistently outperform those produced by both machine and human translators on measures of cultural relevance, coherence, and situational appropriateness.
- North America > Canada > Ontario > Toronto (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- (9 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.97)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation
Villa-Cueva, Emilio, Bolatzhanova, Sholpan, Turmakhan, Diana, Elzeky, Kareem, Ademtew, Henok Biadglign, Aji, Alham Fikri, Araujo, Vladimir, Azime, Israel Abebe, Baek, Jinheon, Belcavello, Frederico, Cristobal, Fermin, Cruz, Jan Christian Blaise, Dabre, Mary, Dabre, Raj, Ehsan, Toqeer, Etori, Naome A, Farooqui, Fauzan, Geng, Jiahui, Ivetta, Guido, Jayakumar, Thanmay, Jeong, Soyeong, Lim, Zheng Wei, Mandal, Aishik, Martinelli, Sofia, Mihaylov, Mihail Minkov, Orel, Daniil, Pramanick, Aniket, Purkayastha, Sukannya, Salazar, Israfel, Song, Haiyue, Torrent, Tiago Timponi, Yadeta, Debela Desalegn, Hamed, Injy, Tonja, Atnafu Lambebo, Solorio, Thamar
Translating cultural content poses challenges for machine translation systems due to the differences in conceptualizations between cultures, where language alone may fail to convey sufficient context to capture region-specific meanings. In this work, we investigate whether images can act as cultural context in multimodal translation. We introduce CaMMT, a human-curated benchmark of over 5,800 triples of images along with parallel captions in English and regional languages. Using this dataset, we evaluate five Vision Language Models (VLMs) in text-only and text+image settings. Through automatic and human evaluations, we find that visual context generally improves translation quality, especially in handling Culturally-Specific Items (CSIs), disambiguation, and correct gender marking. By releasing CaMMT, our objective is to support broader efforts to build and evaluate multimodal translation systems that are better aligned with cultural nuance and regional variations.
- Asia > India (0.04)
- South America > Argentina > Pampas > Córdoba Province > Córdoba (0.04)
- North America > Mexico > Jalisco (0.04)
- (23 more...)