AITopics | moderation decision

Collaborating Authors

moderation decision

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Decoding the Rule Book: Extracting Hidden Moderation Criteria from Reddit Communities

Kim, Youngwoo, Beniwal, Himanshu, Johnson, Steven L., Hartvigsen, Thomas

arXiv.org Artificial IntelligenceSep-4-2025

Effective content moderation systems require explicit classification criteria, yet online communities like subreddits often operate with diverse, implicit standards. This work introduces a novel approach to identify and extract these implicit criteria from historical moderation data using an interpretable architecture. We represent moderation criteria as score tables of lexical expressions associated with content removal, enabling systematic comparison across different communities. Our experiments demonstrate that these extracted lexical patterns effectively replicate the performance of neural moderation models while providing transparent insights into decision-making processes. The resulting criteria matrix reveals significant variations in how seemingly shared norms are actually enforced, uncovering previously undocumented moderation patterns including community-specific tolerances for language, features for topical restrictions, and underlying subcategories of the toxic speech classification.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2509.02926

Country: North America > United States (0.47)

Genre: Research Report (1.00)

Industry:

Law (0.48)
Media > News (0.43)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Revealing Hidden Mechanisms of Cross-Country Content Moderation with Natural Language Processing

Yadav, Neemesh, Liu, Jiarui, Ortu, Francesco, Ensafi, Roya, Jin, Zhijing, Mihalcea, Rada

arXiv.org Artificial IntelligenceMar-10-2025

The ability of Natural Language Processing (NLP) methods to categorize text into multiple classes has motivated their use in online content moderation tasks, such as hate speech and fake news detection. However, there is limited understanding of how or why these methods make such decisions, or why certain content is moderated in the first place. To investigate the hidden mechanisms behind content moderation, we explore multiple directions: 1) training classifiers to reverse-engineer content moderation decisions across countries; 2) explaining content moderation decisions by analyzing Shapley values and LLM-guided explanations. Our primary focus is on content moderation decisions made across countries, using pre-existing corpora sampled from the Twitter Stream Grab. Our experiments reveal interesting patterns in censored posts, both across countries and over time. Through human evaluations of LLM-generated explanations across three LLMs, we assess the effectiveness of using LLMs in content moderation. Finally, we discuss potential future directions, as well as the limitations and ethical considerations of this work. Our code and data are available at https://github.com/causalNLP/censorship

category, censorship, moderation, (14 more...)

arXiv.org Artificial Intelligence

2503.0528

Country:

North America > Canada > Ontario > Toronto (0.28)
Asia > India (0.16)
Asia > Russia (0.15)
(24 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Media (1.00)
Law > Civil Rights & Constitutional Law (1.00)
Information Technology (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Watch Your Language: Investigating Content Moderation with Large Language Models

Kumar, Deepak, AbuHashem, Yousef, Durumeric, Zakir

arXiv.org Artificial IntelligenceJan-17-2024

Large language models (LLMs) have exploded in popularity due to their ability to perform a wide array of natural language tasks. Text-based content moderation is one LLM use case that has received recent enthusiasm, however, there is little research investigating how LLMs perform in content moderation settings. In this work, we evaluate a suite of commodity LLMs on two common content moderation tasks: rule-based community moderation and toxic content detection. For rule-based community moderation, we instantiate 95 subcommunity specific LLMs by prompting GPT-3.5 with rules from 95 Reddit subcommunities. We find that GPT-3.5 is effective at rule-based moderation for many communities, achieving a median accuracy of 64% and a median precision of 83%. For toxicity detection, we evaluate a suite of commodity LLMs (GPT-3, GPT-3.5, GPT-4, Gemini Pro, LLAMA 2) and show that LLMs significantly outperform currently widespread toxicity classifiers. However, recent increases in model size add only marginal benefit to toxicity detection, suggesting a potential performance plateau for LLMs on toxicity detection tasks. We conclude by outlining avenues for future work in studying LLMs and content moderation.

gpt-3, llm, subreddit, (15 more...)

arXiv.org Artificial Intelligence

2309.14517

Country:

Africa > Nigeria (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report > New Finding (0.95)

Industry: Media > News (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Safety and Fairness for Content Moderation in Generative Models

Hao, Susan, Kumar, Piyush, Laszlo, Sarah, Poddar, Shivani, Radharapu, Bhaktipriya, Shelby, Renee

arXiv.org Artificial IntelligenceJun-8-2023

With significant advances in generative AI, new technologies are rapidly being deployed with generative components. Generative models are typically trained on large datasets, resulting in model behaviors that can mimic the worst of the content in the training data. Responsible deployment of generative technologies requires content moderation strategies, such as safety input and output filters. Here, we provide a theoretical framework for conceptualizing responsible content moderation of text-to-image generative technologies, including a demonstration of how to empirically measure the constructs we enumerate. We define and distinguish the concepts of safety, fairness, and metric equity, and enumerate example harms that can come in each domain. We then provide a demonstration of how the defined harms can be quantified. We conclude with a summary of how the style of harms quantification we demonstrate enables data-driven content moderation decisions.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2306.06135

Country:

North America > United States > New York > New York County > New York City (0.05)
Europe > Italy > Tuscany > Florence (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Spain (0.04)

Genre: Research Report (0.50)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.96)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.52)

Add feedback

Reliable Decision from Multiple Subtasks through Threshold Optimization: Content Moderation in the Wild

Son, Donghyun, Lew, Byounggyu, Choi, Kwanghee, Baek, Yongsu, Choi, Seungwoo, Shin, Beomjun, Ha, Sungjoo, Chang, Buru

arXiv.org Artificial IntelligenceJan-25-2023

Social media platforms struggle to protect users from harmful content through content moderation. These platforms have recently leveraged machine learning models to cope with the vast amount of user-generated content daily. Since moderation policies vary depending on countries and types of products, it is common to train and deploy the models per policy. However, this approach is highly inefficient, especially when the policies change, requiring dataset re-labeling and model re-training on the shifted data distribution. To alleviate this cost inefficiency, social media platforms often employ third-party content moderation services that provide prediction scores of multiple subtasks, such as predicting the existence of underage personnel, rude gestures, or weapons, instead of directly providing final moderation decisions. However, making a reliable automated moderation decision from the prediction scores of the multiple subtasks for a specific target policy has not been widely explored yet. In this study, we formulate real-world scenarios of content moderation and introduce a simple yet effective threshold optimization method that searches the optimal thresholds of the multiple subtasks to make a reliable moderation decision in a cost-effective way. Extensive experiments demonstrate that our approach shows better performance in content moderation compared to existing threshold optimization methods and heuristics.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2208.07522

Country:

Asia > Singapore > Central Region > Singapore (0.05)
Asia > South Korea > Seoul > Seoul (0.05)
North America > United States > Texas > Dallas County > Dallas (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Law (0.93)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Users question AI's ability to moderate online harassment

#artificialintelligenceOct-31-2022, 20:35:38 GMT

New Cornell University research finds that both the type of moderator--human or AI--and the "temperature" of harassing content online influence people's perception of the moderation decision and the moderation system. Now published in Big Data & Society, the study used a custom social media site, on which people can post pictures of food and comment on other posts. The site contains a simulation engine, Truman, an open-source platform that mimics other users' behaviors (likes, comments, posts) through preprogrammed bots created and curated by researchers. The Truman platform--named after the 1998 film "The Truman Show"--was developed at the Cornell Social Media Lab led by Natalie Bazarova, professor of communication. "The Truman platform allows researchers to create a controlled yet realistic social media experience for participants, with social and design versatility to examine a variety of research questions about human behaviors in social media," Bazarova said.

moderate online harassment, moderation decision, moderator, (12 more...)

#artificialintelligence

Genre: Research Report > New Finding (0.73)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

Will AI be able to moderate online discussions like humans?

#artificialintelligenceMar-10-2017, 05:50:28 GMT

Some artificial intelligence products have become so advanced in online discussion moderation that they will no longer be confused by colloquial language, neologisms or spelling mistakes. AI is able to take on routine human tasks, but cannot fully replace human intelligence. Online discussions are abound with hate speech and off-topic comments, causing massive headaches for media companies. Legislation requires that illegal messages are removed, and users are more content if they can avoid becoming the target of inappropriate insults. The volumes of comments posted on discussion forums and below news articles can be staggering, and their proper moderation may sometimes require infeasible amounts of manpower.

artificial intelligence, discussion forum, moderation, (11 more...)

#artificialintelligence

Country: Europe > Finland (0.06)

Industry:

Media (0.58)
Law (0.38)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback