UNITYAI-GUARD: Pioneering Toxicity Detection Across Low-Resource Indian Languages
Beniwal, Himanshu, Venkat, Reddybathuni, Kumar, Rohit, Srivibhav, Birudugadda, Jain, Daksh, Doddi, Pavan, Dhande, Eshwar, Ananth, Adithya, Kuldeep, null, Kubadia, Heer, Sharda, Pratham, Singh, Mayank
–arXiv.org Artificial Intelligence
This work introduces UnityAI-Guard, a framework for binary toxicity classification targeting low-resource Indian languages. While existing systems predominantly cater to high-resource languages, UnityAI-Guard addresses this critical gap by developing state-of-the-art models for identifying toxic content across diverse Brahmic/Indic scripts. Our approach achieves an impressive average F1-score of 84.23% across seven languages, leveraging a dataset of 888k training instances and 35k manually verified test instances. By advancing multilingual content moderation for linguistically diverse regions, UnityAI-Guard also provides public API access to foster broader adoption and application.
arXiv.org Artificial Intelligence
Mar-29-2025
- Country:
- Asia
- India
- Gujarat > Gandhinagar (0.04)
- Tamil Nadu > Chennai (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.15)
- Singapore (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- India
- Europe
- Bulgaria (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Middle East > Malta
- Eastern Region > Northern Harbour District > St. Julian's (0.04)
- Ukraine > Kyiv Oblast
- Kyiv (0.04)
- North America > United States
- Louisiana > Orleans Parish > New Orleans (0.04)
- Asia
- Genre:
- Research Report (0.84)
- Technology: