Revealing Hidden Mechanisms of Cross-Country Content Moderation with Natural Language Processing
Yadav, Neemesh, Liu, Jiarui, Ortu, Francesco, Ensafi, Roya, Jin, Zhijing, Mihalcea, Rada
–arXiv.org Artificial Intelligence
The ability of Natural Language Processing (NLP) methods to categorize text into multiple classes has motivated their use in online content moderation tasks, such as hate speech and fake news detection. However, there is limited understanding of how or why these methods make such decisions, or why certain content is moderated in the first place. To investigate the hidden mechanisms behind content moderation, we explore multiple directions: 1) training classifiers to reverse-engineer content moderation decisions across countries; 2) explaining content moderation decisions by analyzing Shapley values and LLM-guided explanations. Our primary focus is on content moderation decisions made across countries, using pre-existing corpora sampled from the Twitter Stream Grab. Our experiments reveal interesting patterns in censored posts, both across countries and over time. Through human evaluations of LLM-generated explanations across three LLMs, we assess the effectiveness of using LLMs in content moderation. Finally, we discuss potential future directions, as well as the limitations and ethical considerations of this work. Our code and data are available at https://github.com/causalNLP/censorship
arXiv.org Artificial Intelligence
Mar-10-2025
- Country:
- Asia
- British Indian Ocean Territory > Diego Garcia (0.04)
- China > Hong Kong (0.04)
- India (0.16)
- Middle East
- Republic of Türkiye (0.06)
- Saudi Arabia > Asir Province
- Abha (0.04)
- Syria (0.04)
- Russia (0.15)
- Thailand > Bangkok
- Bangkok (0.04)
- Europe
- Denmark > Capital Region
- Copenhagen (0.04)
- France (0.06)
- Germany
- Baden-Württemberg > Tübingen Region
- Tübingen (0.04)
- Bavaria > Middle Franconia
- Nuremberg (0.04)
- Berlin (0.04)
- Baden-Württemberg > Tübingen Region
- Italy > Friuli Venezia Giulia
- Trieste Province > Trieste (0.04)
- Middle East > Malta
- Eastern Region > Northern Harbour District > St. Julian's (0.04)
- Russia > North Caucasian Federal District
- Chechen Republic (0.04)
- Sweden (0.04)
- Denmark > Capital Region
- North America
- Canada
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Ontario > Toronto (0.28)
- British Columbia > Metro Vancouver Regional District
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- California > San Diego County
- San Diego (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Michigan (0.04)
- New Mexico > Santa Fe County
- Santa Fe (0.04)
- New York > New York County
- New York City (0.04)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- California > San Diego County
- Canada
- South America > Brazil (0.04)
- Asia
- Genre:
- Research Report > New Finding (0.93)
- Industry:
- Government (1.00)
- Information Technology (1.00)
- Law > Civil Rights & Constitutional Law (1.00)
- Media (1.00)
- Technology: