This data set helps researchers spot harmful stereotypes in LLMs
Although tools that spot stereotypes in AI models already exist, the vast majority of them work only on models trained in English. They identify stereotypes in models trained in other languages by relying on machine translations from English, which can fail to recognize stereotypes found only within certain non-English languages, says Zeerak Talat, at the University of Edinburgh, who worked on the project. To get around these problematic generalizations, SHADES was built using 16 languages from 37 geopolitical regions. SHADES works by probing how a model responds when it's exposed to stereotypes in different ways. The researchers exposed the models to each stereotype within the data set, including through automated prompts, which generated a bias score.
Apr-30-2025, 09:41:25 GMT
- Technology: