Culture Matters in Toxic Language Detection in Persian

Bokaei, Zahra, Magdy, Walid, Webber, Bonnie

Jun-5-2025–arXiv.org Artificial Intelligence

Toxic language detection is crucial for creating safer online environments and limiting the spread of harmful content. While toxic language detection has been under-explored in Persian, the current work compares different methods for this task, including fine-tuning, data enrichment, zero-shot and few-shot learning, and cross-lingual transfer learning. What is especially compelling is the impact of cultural context on transfer learning for this task: We show that the language of a country with cultural similarities to Persian yields better results in transfer learning. Conversely, the improvement is lower when the language comes from a culturally distinct country. Warning: This paper contains examples of toxic language that may disturb some readers. These examples are included for the purpose of research on toxic detection.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Jun-5-2025

arXiv.org PDF

Add feedback

Country:
- Africa > Middle East (0.04)
- Asia
  - Indonesia (0.04)
  - Japan > Honshū
    - Chūbu > Aichi Prefecture > Nagoya (0.04)
  - Middle East
    - Iran (0.04)
    - Israel (0.04)
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.04)
  - North Korea (0.04)
  - Singapore (0.04)
- Europe
  - Bulgaria (0.04)
  - Croatia > Dubrovnik-Neretva County
    - Dubrovnik (0.04)
  - France > Provence-Alpes-Côte d'Azur
    - Bouches-du-Rhône > Marseille (0.05)
  - Greece (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Middle East > Malta
    - Eastern Region > Northern Harbour District > St. Julian's (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Ukraine > Kyiv Oblast
    - Kyiv (0.04)
- North America
  - Mexico > Mexico City
    - Mexico City (0.04)
  - United States
    - Minnesota (0.04)
    - New York > New York County
      - New York City (0.04)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Information Technology > Security & Privacy (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)