Researchers quantify bias in Reddit content sometimes used to train AI

#artificialintelligence 

In a paper published on the preprint server Arxiv.org, This alone isn't surprising, but the problem is that data from these communities are often used to train large language models like OpenAI's GPT-3. That in turn is important because, as OpenAI itself notes, this sort of bias leads to placing words like "naughty" or "sucked" near female pronouns and "Islam" near words like "terrorism." The scientists' approach uses representations of words called embeddings to discover and categorize language biases, which could enable data scientists to trace the severity of bias in different communities and take steps to counteract this bias. To spotlight examples of potentially offensive content on Reddit subcommunities, given a language model and two sets of words representing concepts to compare and discover biases from, the method identifies the most biased words toward the concepts in a given community.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found