Lost in Moderation: How Commercial Content Moderation APIs Over- and Under-Moderate Group-Targeted Hate Speech and Linguistic Variations

Hartmann, David, Oueslati, Amin, Staufer, Dimitri, Pohlmann, Lena, Munzert, Simon, Heuer, Hendrik

Mar-3-2025–arXiv.org Artificial Intelligence

Commercial content moderation APIs are marketed as scalable solutions to combat online hate speech. However, the reliance on these APIs risks both silencing legitimate speech, called over-moderation, and failing to protect online platforms from harmful speech, known as under-moderation. To assess such risks, this paper introduces a framework for auditing black-box NLP systems. Using the framework, we systematically evaluate five widely used commercial content moderation APIs. Analyzing five million queries based on four datasets, we find that APIs frequently rely on group identity terms, such as ``black'', to predict hate speech. While OpenAI's and Amazon's services perform slightly better, all providers under-moderate implicit hate speech, which uses codified messages, especially against LGBTQIA+ individuals. Simultaneously, they over-moderate counter-speech, reclaimed slurs and content related to Black, LGBTQIA+, Jewish, and Muslim people. We recommend that API providers offer better guidance on API implementation and threshold setting and more transparency on their APIs' limitations. Warning: This paper contains offensive and hateful terms and concepts. We have chosen to reproduce these terms for reasons of transparency.

computational linguistic, proceedings, speech, (12 more...)

arXiv.org Artificial Intelligence

Mar-3-2025

arXiv.org PDF

Add feedback

Country:
- South America
  - Paraguay > Asunción
    - Asunción (0.04)
  - Colombia > Meta Department
    - Villavicencio (0.04)
  - Chile > Santiago Metropolitan Region
    - Santiago Province > Santiago (0.04)
  - Brazil > Rio de Janeiro
    - Rio de Janeiro (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - Maryland > Baltimore (0.04)
    - Texas > Travis County
      - Austin (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Hawaii > Honolulu County
      - Honolulu (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Massachusetts > Middlesex County
      - Cambridge (0.14)
    - Georgia > Fulton County
      - Atlanta (0.04)
    - Washington > King County
      - Seattle (0.14)
    - California
      - Santa Clara County > Palo Alto (0.04)
      - San Francisco County > San Francisco (0.04)
      - San Diego County > San Diego (0.04)
      - Los Angeles County > Long Beach (0.04)
    - New York > New York County
      - New York City (0.05)
  - Canada
    - Ontario > Toronto (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
- Europe
  - Greece > Central Macedonia
    - Thessaloniki (0.04)
  - Netherlands > North Holland
    - Amsterdam (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Germany
    - Berlin (0.04)
    - Hamburg (0.04)
    - Brandenburg > Potsdam (0.04)
    - Baden-Württemberg > Stuttgart Region
      - Stuttgart (0.04)
  - France > Provence-Alpes-Côte d'Azur
    - Bouches-du-Rhône > Marseille (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
  - United Kingdom > England
    - West Yorkshire > Leeds (0.04)
    - Oxfordshire > Oxford (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - China > Hong Kong (0.04)
  - Singapore (0.04)
  - Indonesia > Bali (0.04)
  - Middle East > UAE
    - Abu Dhabi Emirate > Abu Dhabi (0.04)
  - Japan > Honshū
    - Chūbu > Toyama Prefecture > Toyama (0.04)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Health & Medicine (1.00)
- Information Technology > Services (0.68)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.67)
- Law > Civil Rights & Constitutional Law (0.67)
- Media (0.67)
- Government > Regional Government (0.67)

Technology:
- Information Technology
  - Communications > Social Media (1.00)
  - Artificial Intelligence
    - Issues > Social & Ethical Issues (1.00)
    - Natural Language > Large Language Model (0.89)
    - Machine Learning > Performance Analysis
      - Accuracy (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found