LLMs become more covertly racist with human intervention

Mar-11-2024, 18:35:31 GMT–MIT Technology Review

Even when the two sentences had the same meaning, the models were more likely to apply adjectives like "dirty," "lazy," and "stupid" to speakers of AAE than speakers of Standard American English (SAE). The models associated speakers of AAE with less prestigious jobs (or didn't associate them with having a job at all), and when asked to pass judgment on a hypothetical criminal defendant, they were more likely to recommend the death penalty. An even more notable finding may be a flaw the study pinpoints in the ways that researchers try to solve such biases. To purge models of hateful views, companies like OpenAI, Meta, and Google use feedback training, in which human workers manually adjust the way the model responds to certain prompts. This process, often called "alignment," aims to recalibrate the millions of connections in the neural network and get the model to conform better with desired values. The method works well to combat overt stereotypes, and leading companies have employed it for nearly a decade.

covertly racist, human intervention, stereotype, (2 more...)

MIT Technology Review

Mar-11-2024, 18:35:31 GMT

News Web Page

Add feedback

Genre:
- Research Report > New Finding (0.36)

Industry:
- Law > Civil Rights & Constitutional Law (0.58)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.92)
  - Machine Learning > Neural Networks
    - Deep Learning (0.42)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found