Researchers quantify bias in Reddit content sometimes used to train AI

Aug-7-2020, 18:55:33 GMT–#artificialintelligence

In a paper published on the preprint server Arxiv.org, This alone isn't surprising, but the problem is that data from these communities are often used to train large language models like OpenAI's GPT-3. That in turn is important because, as OpenAI itself notes, this sort of bias leads to placing words like "naughty" or "sucked" near female pronouns and "Islam" near words like "terrorism." The scientists' approach uses representations of words called embeddings to discover and categorize language biases, which could enable data scientists to trace the severity of bias in different communities and take steps to counteract this bias. To spotlight examples of potentially offensive content on Reddit subcommunities, given a language model and two sets of words representing concepts to compare and discover biases from, the method identifies the most biased words toward the concepts in a given community.

large language model, machine learning, natural language, (19 more...)

#artificialintelligence

Aug-7-2020, 18:55:33 GMT

News Web Page

Add feedback

Country:
- North America > United States (0.06)

Genre:
- Research Report > New Finding (0.49)

Industry:
- Law Enforcement & Public Safety (0.50)
- Media > News (0.66)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning > Generative AI (0.46)
  - Natural Language
    - Chatbot (0.77)
    - Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found