Goto

Collaborating Authors

 akeaway



Appendix - TRIAGE: Characterizing and auditing training data for improved regression Table of Contents

Neural Information Processing Systems

We now illustrate where example's lie on the plot. We highlight in Figure 10, that well-estimated samples are in the middle as they oscillate around 0.5. We also wish to highlight two types of samples that we DO NOT find in practice.



UNITYAI-GUARD: Pioneering Toxicity Detection Across Low-Resource Indian Languages

arXiv.org Artificial Intelligence

This work introduces UnityAI-Guard, a framework for binary toxicity classification targeting low-resource Indian languages. While existing systems predominantly cater to high-resource languages, UnityAI-Guard addresses this critical gap by developing state-of-the-art models for identifying toxic content across diverse Brahmic/Indic scripts. Our approach achieves an impressive average F1-score of 84.23% across seven languages, leveraging a dataset of 888k training instances and 35k manually verified test instances. By advancing multilingual content moderation for linguistically diverse regions, UnityAI-Guard also provides public API access to foster broader adoption and application.


Char-mander Use mBackdoor! A Study of Cross-lingual Backdoor Attacks in Multilingual LLMs

arXiv.org Artificial Intelligence

We explore Cross-lingual Backdoor ATtacks (X-BAT) in multilingual Large Language Models (mLLMs), revealing how backdoors inserted in one language can automatically transfer to others through shared embedding spaces. Using toxicity classification as a case study, we demonstrate that attackers can compromise multilingual systems by poisoning data in a single language, with rare tokens serving as specific effective triggers. Our findings expose a critical vulnerability in the fundamental architecture that enables cross-lingual transfer in these models. Our code and data are publicly available at https://github.com/himanshubeniwal/X-BAT.