AITopics | Wang, Yuxia

Collaborating Authors

Wang, Yuxia

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Llama-3.1-Sherkala-8B-Chat: An Open Large Language Model for Kazakh

Koto, Fajri, Joshi, Rituraj, Mukhituly, Nurdaulet, Wang, Yuxia, Xie, Zhuohan, Pal, Rahul, Orel, Daniil, Mullah, Parvez, Turmakhan, Diana, Goloburda, Maiya, Kamran, Mohammed, Ghosh, Samujjwal, Jia, Bokang, Mansurov, Jonibek, Togmanov, Mukhammed, Banerjee, Debopriyo, Laiyk, Nurkhan, Sakip, Akhmed, Han, Xudong, Kochmar, Ekaterina, Aji, Alham Fikri, Singh, Aaryamonvikram, Jadhav, Alok Anil, Katipomu, Satheesh, Kamboj, Samta, Choudhury, Monojit, Gosal, Gurpreet, Ramakrishnan, Gokul, Mishra, Biswajit, Chandran, Sarath, Sheinin, Avraham, Vassilieva, Natalia, Sengupta, Neha, Murray, Larry, Nakov, Preslav

arXiv.org Artificial IntelligenceMar-3-2025

Llama-3.1-Sherkala-8B-Chat, or Sherkala-Chat (8B) for short, is a state-of-the-art instruction-tuned open generative large language model (LLM) designed for Kazakh. Sherkala-Chat (8B) aims to enhance the inclusivity of LLM advancements for Kazakh speakers. Adapted from the LLaMA-3.1-8B model, Sherkala-Chat (8B) is trained on 45.3B tokens across Kazakh, English, Russian, and Turkish. With 8 billion parameters, it demonstrates strong knowledge and reasoning abilities in Kazakh, significantly outperforming existing open Kazakh and multilingual models of similar scale while achieving competitive performance in English. We release Sherkala-Chat (8B) as an open-weight instruction-tuned model and provide a detailed overview of its training, fine-tuning, safety alignment, and evaluation, aiming to advance research and support diverse real-world applications.

computational linguistic, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2503.01493

Country:

Asia (1.00)
Europe > Middle East > Malta (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report (0.81)
Overview (0.74)

Industry: Education > Curriculum > Subject-Specific Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Comprehensive Survey of Machine Unlearning Techniques for Large Language Models

Geng, Jiahui, Li, Qing, Woisetschlaeger, Herbert, Chen, Zongxiong, Wang, Yuxia, Nakov, Preslav, Jacobsen, Hans-Arno, Karray, Fakhri

arXiv.org Artificial IntelligenceFeb-22-2025

This study investigates the machine unlearning techniques within the context of large language models (LLMs), referred to as \textit{LLM unlearning}. LLM unlearning offers a principled approach to removing the influence of undesirable data (e.g., sensitive or illegal information) from LLMs, while preserving their overall utility without requiring full retraining. Despite growing research interest, there is no comprehensive survey that systematically organizes existing work and distills key insights; here, we aim to bridge this gap. We begin by introducing the definition and the paradigms of LLM unlearning, followed by a comprehensive taxonomy of existing unlearning studies. Next, we categorize current unlearning approaches, summarizing their strengths and limitations. Additionally, we review evaluation metrics and benchmarks, providing a structured overview of current assessment methodologies. Finally, we outline promising directions for future research, highlighting key challenges and opportunities in the field.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.01854

Country:

Asia (0.68)
North America > Canada > Ontario > Toronto (0.14)
North America > Mexico > Mexico City (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Instruction Tuning on Public Government and Cultural Data for Low-Resource Language: a Case Study in Kazakh

Laiyk, Nurkhan, Orel, Daniil, Joshi, Rituraj, Goloburda, Maiya, Wang, Yuxia, Nakov, Preslav, Koto, Fajri

arXiv.org Artificial IntelligenceFeb-19-2025

Instruction tuning in low-resource languages remains underexplored due to limited text data, particularly in government and cultural domains. To address this, we introduce and open-source a large-scale (10,600 samples) instruction-following (IFT) dataset, covering key institutional and cultural knowledge relevant to Kazakhstan. Our dataset enhances LLMs' understanding of procedural, legal, and structural governance topics. We employ LLM-assisted data generation, comparing open-weight and closed-weight models for dataset construction, and select GPT-4o as the backbone. Each entity of our dataset undergoes full manual verification to ensure high quality. We also show that fine-tuning Qwen, Falcon, and Gemma on our dataset leads to consistent performance improvements in both multiple-choice and generative tasks, demonstrating the potential of LLM-assisted instruction tuning for low-resource languages.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.13647

Country:

Asia > Kazakhstan > Almaty Region (0.14)
Asia > Middle East > UAE (0.14)
Asia > Kazakhstan > Mangystau Region (0.14)

Genre:

Research Report (1.00)
Personal (1.00)

Industry:

Law (1.00)
Health & Medicine (1.00)
Banking & Finance (0.93)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Qorgau: Evaluating LLM Safety in Kazakh-Russian Bilingual Contexts

Goloburda, Maiya, Laiyk, Nurkhan, Turmakhan, Diana, Wang, Yuxia, Togmanov, Mukhammed, Mansurov, Jonibek, Sametov, Askhat, Mukhituly, Nurdaulet, Wang, Minghan, Orel, Daniil, Mujahid, Zain Muhammad, Koto, Fajri, Baldwin, Timothy, Nakov, Preslav

arXiv.org Artificial IntelligenceFeb-19-2025

Large language models (LLMs) are known to have the potential to generate harmful content, posing risks to users. While significant progress has been made in developing taxonomies for LLM risks and safety evaluation prompts, most studies have focused on monolingual contexts, primarily in English. However, language- and region-specific risks in bilingual contexts are often overlooked, and core findings can diverge from those in monolingual settings. In this paper, we introduce Qorgau, a novel dataset specifically designed for safety evaluation in Kazakh and Russian, reflecting the unique bilingual context in Kazakhstan, where both Kazakh (a low-resource language) and Russian (a high-resource language) are spoken. Experiments with both multilingual and language-specific LLMs reveal notable differences in safety performance, emphasizing the need for tailored, region-specific datasets to ensure the responsible and safe deployment of LLMs in countries like Kazakhstan. Warning: this paper contains example data that may be offensive, harmful, or biased.

information, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2502.1364

Country:

Asia > Kazakhstan (0.72)
Europe > Middle East > Malta (0.14)

Genre: Research Report (0.82)

Industry:

Government (0.93)
Law (0.69)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

KazMMLU: Evaluating Language Models on Kazakh, Russian, and Regional Knowledge of Kazakhstan

Togmanov, Mukhammed, Mukhituly, Nurdaulet, Turmakhan, Diana, Mansurov, Jonibek, Goloburda, Maiya, Sakip, Akhmed, Xie, Zhuohan, Wang, Yuxia, Syzdykov, Bekassyl, Laiyk, Nurkhan, Aji, Alham Fikri, Kochmar, Ekaterina, Nakov, Preslav, Koto, Fajri

arXiv.org Artificial IntelligenceFeb-18-2025

Despite having a population of twenty million, Kazakhstan's culture and language remain underrepresented in the field of natural language processing. Although large language models (LLMs) continue to advance worldwide, progress in Kazakh language has been limited, as seen in the scarcity of dedicated models and benchmark evaluations. To address this gap, we introduce KazMMLU, the first MMLU-style dataset specifically designed for Kazakh language. KazMMLU comprises 23,000 questions that cover various educational levels, including STEM, humanities, and social sciences, sourced from authentic educational materials and manually validated by native speakers and educators. The dataset includes 10,969 Kazakh questions and 12,031 Russian questions, reflecting Kazakhstan's bilingual education system and rich local context. Our evaluation of several state-of-the-art multilingual models (Llama-3.1, Qwen-2.5, GPT-4, and DeepSeek V3) demonstrates substantial room for improvement, as even the best-performing models struggle to achieve competitive performance in Kazakh and Russian. These findings underscore significant performance gaps compared to high-resource languages. We hope that our dataset will enable further research and development of Kazakh-centric LLMs. Data and code will be made available upon acceptance.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2502.12829

Country: Asia > Kazakhstan (0.83)

Genre: Research Report > New Finding (0.66)

Industry:

Education > Curriculum > Subject-Specific Education (0.68)
Education > Educational Setting > K-12 Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Is Human-Like Text Liked by Humans? Multilingual Human Detection and Preference Against AI

Wang, Yuxia, Xing, Rui, Mansurov, Jonibek, Puccetti, Giovanni, Xie, Zhuohan, Ta, Minh Ngoc, Geng, Jiahui, Su, Jinyan, Abassy, Mervat, Ahmed, Saad El Dine, Elozeiri, Kareem, Laiyk, Nurkhan, Goloburda, Maiya, Mahmoud, Tarek, Tomar, Raj Vardhan, Aziz, Alexander, Koike, Ryuto, Kaneko, Masahiro, Shelmanov, Artem, Artemova, Ekaterina, Mikhailov, Vladislav, Tsvigun, Akim, Aji, Alham Fikri, Habash, Nizar, Gurevych, Iryna, Nakov, Preslav

arXiv.org Artificial IntelligenceFeb-17-2025

Prior studies have shown that distinguishing text generated by large language models (LLMs) from human-written one is highly challenging, and often no better than random guessing. To verify the generalizability of this finding across languages and domains, we perform an extensive case study to identify the upper bound of human detection accuracy. Across 16 datasets covering 9 languages and 9 domains, 19 annotators achieved an average detection accuracy of 87.6%, thus challenging previous conclusions. We find that major gaps between human and machine text lie in concreteness, cultural nuances, and diversity. Prompting by explicitly explaining the distinctions in the prompts can partially bridge the gaps in over 50% of the cases. However, we also find that humans do not always prefer human-written text, particularly when they cannot clearly identify its source.

annotator, large language model, machine learning, (24 more...)

arXiv.org Artificial Intelligence

2502.11614

Country:

Europe (0.92)
Asia > Vietnam (0.28)
Asia > China (0.28)
North America > United States (0.28)

Genre:

Overview (0.92)
Research Report > New Finding (0.67)

Industry:

Media > News (1.00)
Education (1.00)
Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.96)

Add feedback

GenAI Content Detection Task 1: English and Multilingual Machine-Generated Text Detection: AI vs. Human

Wang, Yuxia, Shelmanov, Artem, Mansurov, Jonibek, Tsvigun, Akim, Mikhailov, Vladislav, Xing, Rui, Xie, Zhuohan, Geng, Jiahui, Puccetti, Giovanni, Artemova, Ekaterina, su, jinyan, Ta, Minh Ngoc, Abassy, Mervat, Elozeiri, Kareem Ashraf, Etter, Saad El Dine Ahmed El, Goloburda, Maiya, Mahmoud, Tarek, Tomar, Raj Vardhan, Laiyk, Nurkhan, Afzal, Osama Mohammed, Koike, Ryuto, Kaneko, Masahiro, Aji, Alham Fikri, Habash, Nizar, Gurevych, Iryna, Nakov, Preslav

arXiv.org Artificial IntelligenceJan-19-2025

We present the GenAI Content Detection Task~1 -- a shared task on binary machine generated text detection, conducted as a part of the GenAI workshop at COLING 2025. The task consists of two subtasks: Monolingual (English) and Multilingual. The shared task attracted many participants: 36 teams made official submissions to the Monolingual subtask during the test phase and 26 teams -- to the Multilingual. We provide a comprehensive overview of the data, a summary of the results -- including system rankings and performance scores -- detailed descriptions of the participating systems, and an in-depth analysis of submissions. https://github.com/mbzuai-nlp/COLING-2025-Workshop-on-MGT-Detection-Task1

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2501.11012

Country:

North America > United States (0.28)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.16)
North America > Mexico > Mexico City (0.14)
Europe > Middle East > Malta (0.14)

Genre: Overview (1.00)

Industry: Media > News (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability

Li, Haonan, Han, Xudong, Zhai, Zenan, Mu, Honglin, Wang, Hao, Zhang, Zhenxuan, Geng, Yilin, Lin, Shom, Wang, Renxi, Shelmanov, Artem, Qi, Xiangyu, Wang, Yuxia, Hong, Donghai, Yuan, Youliang, Chen, Meng, Tu, Haoqin, Koto, Fajri, Kuribayashi, Tatsuki, Zeng, Cong, Bhardwaj, Rishabh, Zhao, Bingchen, Duan, Yawen, Liu, Yi, Alghamdi, Emad A., Yang, Yaodong, Dong, Yinpeng, Poria, Soujanya, Liu, Pengfei, Liu, Zhengzhong, Ren, Xuguang, Hovy, Eduard, Gurevych, Iryna, Nakov, Preslav, Choudhury, Monojit, Baldwin, Timothy

arXiv.org Artificial IntelligenceDec-24-2024

To address this gap, we introduce Libra-Leaderboard, a comprehensive framework designed to rank LLMs through a balanced evaluation of performance and safety. Combining a dynamic leaderboard with an interactive LLM arena, Libra-Leaderboard encourages the joint optimization of capability and safety. Unlike traditional approaches that average performance and safety metrics, Libra-Leaderboard uses a distance-to-optimal-score method to calculate the overall rankings. This approach incentivizes models to achieve a balance rather than excelling in one dimension at the expense of some other ones. In the first release, Libra-Leaderboard evaluates 26 mainstream LLMs from 14 leading organizations, identifying critical safety challenges even in state-of-the-art models.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.18551

Country:

North America > United States (0.28)
North America > Mexico (0.28)
Asia > Middle East (0.28)
Asia > China (0.28)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Detection of Human and Machine-Authored Fake News in Urdu

Ali, Muhammad Zain, Wang, Yuxia, Pfahringer, Bernhard, Smith, Tony

arXiv.org Artificial IntelligenceOct-25-2024

The rise of social media has amplified the spread of fake news, now further complicated by large language models (LLMs) like ChatGPT, which ease the generation of highly convincing, error-free misinformation, making it increasingly challenging for the public to discern truth from falsehood. Traditional fake news detection methods relying on linguistic cues also becomes less effective. Moreover, current detectors primarily focus on binary classification and English texts, often overlooking the distinction between machine-generated true vs. fake news and the detection in low-resource languages. To this end, we updated detection schema to include machine-generated news with focus on the Urdu language. We further propose a hierarchical detection strategy to improve the accuracy and robustness. Experiments show its effectiveness across four datasets in various settings.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.19517

Country:

North America (0.68)
Europe (0.46)

Genre: Research Report > New Finding (0.68)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Arabic Dataset for LLM Safeguard Evaluation

Ashraf, Yasser, Wang, Yuxia, Gu, Bin, Nakov, Preslav, Baldwin, Timothy

arXiv.org Artificial IntelligenceOct-22-2024

The growing use of large language models (LLMs) has raised concerns regarding their safety. While many studies have focused on English, the safety of LLMs in Arabic, with its linguistic and cultural complexities, remains under-explored. Here, we aim to bridge this gap. In particular, we present an Arab-region-specific safety evaluation dataset consisting of 5,799 questions, including direct attacks, indirect attacks, and harmless requests with sensitive words, adapted to reflect the socio-cultural context of the Arab world. To uncover the impact of different stances in handling sensitive and controversial topics, we propose a dual-perspective evaluation framework. It assesses the LLM responses from both governmental and opposition viewpoints. Experiments over five leading Arabic-centric and multilingual LLMs reveal substantial disparities in their safety performance. This reinforces the need for culturally specific datasets to ensure the responsible deployment of LLMs.

government, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2410.1704

Country:

North America (0.67)
Europe (0.67)
Asia > Middle East (0.46)
Africa > Middle East > Egypt (0.28)

Genre: Research Report (0.40)

Industry:

Government (1.00)
Law Enforcement & Public Safety (0.93)
Law > Civil Rights & Constitutional Law (0.70)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)

Add feedback