Researchers release dataset to expose racial, religious, and gender biases in language models
Natural language models are the building blocks of apps including machine translators, text summarizers, chatbots, and writing assistants. But there's growing evidence showing that these models risk reinforcing undesirable stereotypes, mostly because a portion of the training data is commonly sourced from communities with gender, race, and religious prejudices. For example, OpenAI's GPT-3 places words like "naughty" or "sucked" near female pronouns and "Islam" near words like "terrorism." A new study from researchers affiliated with Amazon and the University of California, Santa Barbara aims to shed light specifically on biases in open-ended English natural language generation. The researchers created what they claim is the largest benchmark dataset of its kind containing 23,679 prompts, 5 domains, and 43 subgroups extracted from Wikipedia articles.
Feb-6-2021, 04:13:24 GMT
- Country:
- North America > United States > California > Santa Barbara County > Santa Barbara (0.26)
- Technology: