AITopics | human moderator

Collaborating Authors

human moderator

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Question the Questions: Auditing Representation in Online Deliberative Processes

De, Soham, Gelauff, Lodewijk, Goel, Ashish, Milli, Smitha, Procaccia, Ariel, Siu, Alice

arXiv.org Artificial IntelligenceNov-7-2025

A central feature of many deliberative processes, such as citizens' assemblies and deliberative polls, is the opportunity for participants to engage directly with experts. While participants are typically invited to propose questions for expert panels, only a limited number can be selected due to time constraints. This raises the challenge of how to choose a small set of questions that best represent the interests of all participants. We introduce an auditing framework for measuring the level of representation provided by a slate of questions, based on the social choice concept known as justified representation (JR). We present the first algorithms for auditing JR in the general utility setting, with our most efficient algorithm achieving a runtime of $O(mn\log n)$, where $n$ is the number of participants and $m$ is the number of proposed questions. We apply our auditing methods to historical deliberations, comparing the representativeness of (a) the actual questions posed to the expert panel (chosen by a moderator), (b) participants' questions chosen via integer linear programming, (c) summary questions generated by large language models (LLMs). Our results highlight both the promise and current limitations of LLMs in supporting deliberative processes. By integrating our methods into an online deliberation platform that has been used for over hundreds of deliberations across more than 50 countries, we make it easy for practitioners to audit and improve representation in future deliberations.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2511.04588

Country: North America > United States (0.94)

Genre: Research Report > New Finding (0.88)

Industry: Government > Voting & Elections (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

RedHerring Attack: Testing the Reliability of Attack Detection

Rusert, Jonathan

arXiv.org Artificial IntelligenceSep-26-2025

In response to adversarial text attacks, attack detection models have been proposed and shown to successfully identify text modified by adversaries. Attack detection models can be leveraged to provide an additional check for NLP models and give signals for human input. However, the reliability of these models has not yet been thoroughly explored. Thus, we propose and test a novel attack setting and attack, RedHerring. RedHerring aims to make attack detection models unreliable by modifying a text to cause the detection model to predict an attack, while keeping the classifier correct. This creates a tension between the classifier and detector. If a human sees that the detector is giving an ``incorrect'' prediction, but the classifier a correct one, then the human will see the detector as unreliable. We test this novel threat model on 4 datasets against 3 detectors defending 4 classifiers. We find that RedHerring is able to drop detection accuracy between 20 - 71 points, while maintaining (or improving) classifier accuracy. As an initial defense, we propose a simple confidence check which requires no retraining of the classifier or detector and increases detection accuracy greatly. This novel threat model offers new insights into how adversaries may target detection models.

classifier, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.20691

Country:

Europe (0.93)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

Scalable Evaluation of Online Moderation Strategies via Synthetic Simulations

Tsirmpas, Dimitris, Androutsopoulos, Ion, Pavlopoulos, John

arXiv.org Artificial IntelligenceMar-13-2025

Despite the ever-growing importance of online moderation, there has been no large-scale study evaluating the effectiveness of alternative moderation strategies. This is largely due to the lack of appropriate datasets, and the difficulty of getting human discussants, moderators, and evaluators involved in multiple experiments. In this paper, we propose a methodology for leveraging synthetic experiments performed exclusively by Large Language Models (LLMs) to initially bypass the need for human participation in experiments involving online moderation. We evaluate six LLM moderation configurations; two currently used real-life moderation strategies (guidelines issued for human moderators for online moderation and real-life facilitation), two baseline strategies (guidelines elicited for LLM alignment work, and LLM moderation with minimal prompting) a baseline with no moderator at all, as well as our own proposed strategy inspired by a Reinforcement Learning (RL) formulation of the problem. We find that our own moderation strategy significantly outperforms established moderation guidelines, as well as out-of-the-box LLM moderation. We also find that smaller LLMs, with less intensive instruction-tuning, can create more varied discussions than larger models. In order to run these experiments, we create and release an efficient, purpose-built, open-source Python framework, dubbed "SynDisco" to easily simulate hundreds of discussions using LLM user-agents and moderators. Additionally, we release the Virtual Moderation Dataset (VMD), a large dataset of LLM-generated and LLM-annotated discussions, generated by three families of open-source LLMs accompanied by an exploratory analysis of the dataset.

moderation strategy, moderator, semanticscholar, (13 more...)

arXiv.org Artificial Intelligence

2503.16505

Country:

Europe > Greece (0.04)
North America > United States > California (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(13 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Government > Regional Government (0.67)
Media > News (0.46)
Law Enforcement & Public Safety > Terrorism (0.46)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

SLM-Mod: Small Language Models Surpass LLMs at Content Moderation

Zhan, Xianyang, Goyal, Agam, Chen, Yilun, Chandrasekharan, Eshwar, Saha, Koustuv

arXiv.org Artificial IntelligenceOct-16-2024

Large language models (LLMs) have shown promise in many natural language understanding tasks, including content moderation. However, these models can be expensive to query in real-time and do not allow for a community-specific approach to content moderation. To address these challenges, we explore the use of open-source small language models (SLMs) for community-specific content moderation tasks. We fine-tune and evaluate SLMs (less than 15B parameters) by comparing their performance against much larger open- and closed-sourced models. Using 150K comments from 15 popular Reddit communities, we find that SLMs outperform LLMs at content moderation -- 11.5% higher accuracy and 25.7% higher recall on average across all communities. We further show the promise of cross-community content moderation, which has implications for new communities and the development of cross-platform moderation techniques. Finally, we outline directions for future work on language model based content moderation. Code and links to HuggingFace models can be found at https://github.com/AGoyal0512/SLM-Mod.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.13155

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.47)

Industry:

Media > News (0.49)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback

Large Language Models for Automatic Detection of Sensitive Topics

Wen, Ruoyu, Crowe, Stephanie Elena, Gupta, Kunal, Li, Xinyue, Billinghurst, Mark, Hoermann, Simon, Allan, Dwain, Nassani, Alaeddin, Piumsomboon, Thammathip

arXiv.org Artificial IntelligenceSep-2-2024

Sensitive information detection is crucial in content moderation to maintain safe online communities. Assisting in this traditionally manual process could relieve human moderators from overwhelming and tedious tasks, allowing them to focus solely on flagged content that may pose potential risks. Rapidly advancing large language models (LLMs) are known for their capability to understand and process natural language and so present a potential solution to support this process. This study explores the capabilities of five LLMs for detecting sensitive messages in the mental well-being domain within two online datasets and assesses their performance in terms of accuracy, precision, recall, F1 scores, and consistency. Our findings indicate that LLMs have the potential to be integrated into the moderation workflow as a convenient and precise detection tool. The best-performing model, GPT-4o, achieved an average accuracy of 99.5\% and an F1-score of 0.99. We discuss the advantages and potential challenges of using LLMs in the moderation workflow and suggest that future research should address the ethical considerations of utilising this technology.

accuracy, conference acronym, llm, (12 more...)

arXiv.org Artificial Intelligence

2409.0094

Country:

Africa > Zimbabwe (0.14)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States > Texas (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.94)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Consumer Health (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Vicarious Offense and Noise Audit of Offensive Speech Classifiers: Unifying Human and Machine Disagreement on What is Offensive

Weerasooriya, Tharindu Cyril, Dutta, Sujan, Ranasinghe, Tharindu, Zampieri, Marcos, Homan, Christopher M., KhudaBukhsh, Ashiqur R.

arXiv.org Artificial IntelligenceNov-9-2023

Offensive speech detection is a key component of content moderation. However, what is offensive can be highly subjective. This paper investigates how machine and human moderators disagree on what is offensive when it comes to real-world social web political discourse. We show that (1) there is extensive disagreement among the moderators (humans and machines); and (2) human and large-language-model classifiers are unable to predict how other human raters will respond, based on their political leanings. For (1), we conduct a noise audit at an unprecedented scale that combines both machine and human responses. For (2), we introduce a first-of-its-kind dataset of vicarious offense. Our noise audit reveals that moderation outcomes vary wildly across different machine moderators. Our experiments with human moderators suggest that political leanings combined with sensitive issues affect both first-person and vicarious offense. The dataset is available through https://github.com/Homan-Lab/voiced.

annotator, machine moderator, moderator, (11 more...)

arXiv.org Artificial Intelligence

2301.12534

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Alaska (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(10 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Media > News (1.00)
Leisure & Entertainment (1.00)
Information Technology (1.00)
(6 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

OpenAI is using GPT-4 to build an AI-powered content moderation system

EngadgetAug-15-2023, 18:49:33 GMT

Content moderation has been one of the thorniest issues on the internet for decades. It's a difficult subject matter for anyone to tackle, considering the subjectivity that goes hand-in-hand with figuring out what content should be permissible on a given platform. ChatGPT maker OpenAI thinks it can help and it has been putting GPT-4's content moderation skills to the test. It's using the large multimodal model "to build a content moderation system that is scalable, consistent and customizable." The company wrote in a blog post that GPT-4 can not only help make content moderation decisions, but aid in developing policies and swiftly iterating on policy changes, "reducing the cycle from months to hours."

large language model, machine learning, natural language, (14 more...)

Engadget

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.79)

Add feedback

AI moderation will cause more harm than good

#artificialintelligenceNov-21-2022, 13:40:38 GMT

Creating a game with a large, highly engaged online player base and an active community is, for many companies, right at the top of their wishlist. When they're really well managed, these games are a license to print money, to the extent that a single game can become a primary commercial driver of a pretty large company. Games like Fortnite, World of Warcraft, Call of Duty, Grand Theft Auto V, and Final Fantasy XIV, to name but a few, have become central to the ongoing success of the publishers who created and operate them. Their importance rests on the fact that while many popular franchises can rely on a huge launch for each new instalment, these games never actually stop being played and making money. It's no wonder that executives around the industry get dollar signs in their eyes when anyone starts talking about service-based games with high engagement. There are, of course, downsides.

ai system, moderation, moderator, (16 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.95)

Add feedback

How AI Is Moderating Online Content

#artificialintelligenceAug-29-2022, 08:44:18 GMT

AI can help flag harmful or offensive content faster and more effectively! Whether it be by posting a photo on Instagram or writing a blog post, we're all adding more information to the internet. With over 4.62 billion people using social media, there are bound to be some bad eggs creating harmful or deceitful content. To make sure that users are exposed to as little bad content as possible, websites practice content moderation. Content moderation is the process of regulating and monitoring user-generated content based on a set of pre-existing rules and guidelines.

moderation, moderator, user-generated content, (9 more...)

#artificialintelligence

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

AI is not smart enough to solve Meta's content-policing problems, whistleblowers say

#artificialintelligenceJun-15-2022, 13:54:08 GMT

Artificial intelligence is nowhere near good enough to address problems facing content moderation on Facebook, according to whistleblower Frances Haugen. Haugen appeared at an event in London Tuesday evening with Daniel Motaung, a former Facebook moderator who is suing the company in Kenya accusing it of human trafficking. Meta has praised the efficacy of its AI systems in the past. CEO Mark Zuckerberg told a Congressional hearing in March 2021 the company relies on AI to weed out over 95% of "hate speech content." In February this year Zuckerberg said the company wants to get its AI to a "human level" of intelligence.

haugen, moderator, motaung, (15 more...)

#artificialintelligence

Country:

Europe > France (0.62)
North America > United States > New York (0.06)
Africa > Kenya > Nairobi City County > Nairobi (0.06)

Industry:

Law (0.92)
Information Technology > Services (0.73)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback