Identity-related Speech Suppression in Generative AI Content Moderation
Anigboro, Oghenefejiro Isaacs, Crawford, Charlie M., Metaxa, Danaë, Friedler, Sorelle A.
–arXiv.org Artificial Intelligence
Automated content moderation systems have long been used to help reduce the occurrence of violent, hateful, sexual, or otherwise undesired user-generated content online, including in online comment sections and by social media platforms [7, 19, 24]. As content is generated by AI systems, automated content moderation techniques are being applied to the text generated by these systems to filter unwanted content before it is shown to users [21, 22]. However, content moderation is known to suffer from identity-related biases, such that speech by or about marginalized identities is more likely to be incorrectly flagged as inappropriate content [5, 10, 27]. In this paper, we conduct an audit of five content moderation systems to measure identity-related speech suppression, introducing benchmark datasets and definitions to quantify these biases in the context of generative AI systems. Previous assessments of content moderation systems have used benchmark datasets to measure effectiveness and bias. These include datasets composed of user-generated content, such as tweets or internet comments, that have been hand-labeled according to a content moderation rubric [2, 8]. However, most of these datasets are composed of short-form content and do not include the types of text involved in generative AI systems, be they user-generated prompts or system-provided responses. Automated content moderation systems applied in generative AI settings may have unexpected or undesired results, for example flagging PG-rated movie scripts as inappropriate content [21]. As generative AI is increasingly used for creative and expressive text generation from schools to Hollywood, this paper is motivated by this question: whose stories won't be told?
arXiv.org Artificial Intelligence
Sep-9-2024
- Country:
- Africa > Middle East
- Egypt (0.14)
- Europe (1.00)
- North America > United States
- New York (0.14)
- Africa > Middle East
- Genre:
- Research Report
- Experimental Study (0.46)
- New Finding (0.67)
- Research Report
- Industry:
- Health & Medicine > Therapeutic Area
- Neurology (0.46)
- Psychiatry/Psychology (0.46)
- Law (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Leisure & Entertainment (1.00)
- Media > Film (1.00)
- Health & Medicine > Therapeutic Area
- Technology: