AITopics | content moderator

Collaborating Authors

content moderator

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Is the AI boom finally starting to slow down?

The GuardianAug-26-2025, 13:19:32 GMT

Drive down the 280 freeway in San Francisco and you might believe AI is everywhere, and everything. Nearly every billboard advertises an AI related product: "We've Automated 2,412 BDRs." "All that AI and still no ROI?" "Cheap on-demand GPU clusters." It's hard to know if you're interpreting the industry jargon correctly while zooming past in your vehicle. The signs are just one example of the tech industry's en-masse pivot to AI, a technology that the executives who have the most to gain from it say will be universe-shifting, inevitable and unavoidable. In California's tech heartland, every company is now an AI company, just like every company became a tech company sometime in the 2010s.

large language model, machine learning, trust and safety team, (18 more...)

The Guardian

Country:

North America > United States > California > San Francisco County > San Francisco (0.25)
Europe > United Kingdom (0.15)
Europe > Sweden (0.05)
(5 more...)

Industry:

Information Technology > Services (1.00)
Government > Regional Government > North America Government > United States Government (0.48)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.53)

Add feedback

When Testing AI Tests Us: Safeguarding Mental Health on the Digital Frontlines

Pendse, Sachin R., Gergle, Darren, Kornfield, Rachel, Meyerhoff, Jonah, Mohr, David, Suh, Jina, Wescott, Annie, Williams, Casey, Schleider, Jessica

arXiv.org Artificial IntelligenceApr-30-2025

Red-teaming is a core part of the infrastructure that ensures that AI models do not produce harmful content. Unlike past technologies, the black box nature of generative AI systems necessitates a uniquely interactional mode of testing, one in which individuals on red teams actively interact with the system, leveraging natural language to simulate malicious actors and solicit harmful outputs. This interactional labor done by red teams can result in mental health harms that are uniquely tied to the adversarial engagement strategies necessary to effectively red team. The importance of ensuring that generative AI models do not propagate societal or individual harm is widely recognized -- one less visible foundation of end-to-end AI safety is also the protection of the mental health and wellbeing of those who work to keep model outputs safe. In this paper, we argue that the unmet mental health needs of AI red-teamers is a critical workplace safety concern. Through analyzing the unique mental health impacts associated with the labor done by red teams, we propose potential individual and organizational strategies that could be used to meet these needs, and safeguard the mental health of red-teamers. We develop our proposed strategies through drawing parallels between common red-teaming practices and interactional labor common to other professions (including actors, mental health professionals, conflict photographers, and content moderators), describing how individuals and organizations within these professional spaces safeguard their mental health given similar psychological demands. Drawing on these protective practices, we describe how safeguards could be adapted for the distinct mental health challenges experienced by red teaming organizations as they mitigate emerging technological risks on the new digital frontlines.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2504.2091

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.71)

Add feedback

Effective Black-Box Multi-Faceted Attacks Breach Vision Large Language Model Guardrails

Yang, Yijun, Wang, Lichao, Yang, Xiao, Hong, Lanqing, Zhu, Jun

arXiv.org Artificial IntelligenceFeb-8-2025

Vision Large Language Models (VLLMs) integrate visual data processing, expanding their real-world applications, but also increasing the risk of generating unsafe responses. In response, leading companies have implemented Multi-Layered safety defenses, including alignment training, safety system prompts, and content moderation. However, their effectiveness against sophisticated adversarial attacks remains largely unexplored. In this paper, we propose MultiFaceted Attack, a novel attack framework designed to systematically bypass Multi-Layered Defenses in VLLMs. It comprises three complementary attack facets: Visual Attack that exploits the multimodal nature of VLLMs to inject toxic system prompts through images; Alignment Breaking Attack that manipulates the model's alignment mechanism to prioritize the generation of contrasting responses; and Adversarial Signature that deceives content moderators by strategically placing misleading information at the end of the response. Extensive evaluations on eight commercial VLLMs in a black-box setting demonstrate that MultiFaceted Attack achieves a 61.56% attack success rate, surpassing state-of-the-art methods by at least 42.18%.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.05772

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Mexico > Mexico City > Mexico City (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
(4 more...)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.48)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Conceptualization of "Fair Explanation": Disparate Impacts of anti-Asian Hate Speech Explanations on Content Moderators

Nguyen, Tin, Xu, Jiannan, Roy, Aayushi, Daumé, Hal III, Carpuat, Marine

arXiv.org Artificial IntelligenceOct-23-2023

Recent research at the intersection of AI explainability and fairness has focused on how explanations can improve human-plus-AI task performance as assessed by fairness measures. We propose to characterize what constitutes an explanation that is itself "fair" -- an explanation that does not adversely impact specific populations. We formulate a novel evaluation method of "fair explanations" using not just accuracy and label time, but also psychological impact of explanations on different user groups across many metrics (mental discomfort, stereotype activation, and perceived workload). We apply this method in the context of content moderation of potential hate speech, and its differential impact on Asian vs. non-Asian proxy moderators, across explanation approaches (saliency map and counterfactual explanation). We find that saliency maps generally perform better and show less evidence of disparate impact (group) and individual unfairness than counterfactual explanations. Content warning: This paper contains examples of hate speech and racially discriminatory language. The authors do not support such content. Please consider your risk of discomfort carefully before continuing reading!

counterfactual explanation, explanation, participant, (13 more...)

arXiv.org Artificial Intelligence

2310.15055

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Maryland (0.04)
Asia > Taiwan (0.04)
(11 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.47)
Law > Civil Rights & Constitutional Law (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.91)

Add feedback

'It's destroyed me completely': Kenyan moderators decry toll of training of AI models

The GuardianAug-2-2023, 15:00:20 GMT

The images pop up in Mophat Okinyi's mind when he's alone, or when he's about to sleep. Okinyi, a former content moderator for Open AI's ChatGPT in Nairobi, Kenya, is one of four people in that role who have filed a petition to the Kenyan government calling for an investigation into what they describe as exploitative conditions for contractors reviewing the content that powers artificial intelligence programs. "It has really damaged my mental health," said Okinyi. The 27-year-old said he would would view up to 700 text passages a day, many depicting graphic sexual violence. He recalls he started avoiding people after having read texts about rapists and found himself projecting paranoid narratives on to people around him.

chatgpt, content moderator, moderator, (12 more...)

The Guardian

Country:

Africa > Kenya > Nairobi City County > Nairobi (0.28)
Africa > Kenya > Nairobi Province (0.25)
North America > United States > California (0.15)
(4 more...)

Industry:

Law (1.00)
Government > Regional Government > Africa Government > Kenya Government (0.35)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.56)

Add feedback

Hybrid moderation in the newsroom: Recommending featured posts to content moderators

Waterschoot, Cedric, Bosch, Antal van den

arXiv.org Artificial IntelligenceJul-14-2023

Online news outlets are grappling with the moderation of user-generated content within their comment section. We present a recommender system based on ranking class probabilities to support and empower the moderator in choosing featured posts, a time-consuming task. By combining user and textual content features we obtain an optimal classification F1-score of 0.44 on the test set. Furthermore, we observe an optimum mean NDCG@5 of 0.87 on a large set of validation articles. As an expert evaluation, content moderators assessed the output of a random selection of articles by choosing comments to feature based on the recommendations, which resulted in a NDCG score of 0.83. We conclude that first, adding text features yields the best score and second, while choosing featured content remains somewhat subjective, content moderators found suitable comments in all but one evaluated recommendations. We end the paper by analyzing our best-performing model, a step towards transparency and explainability in hybrid content moderation.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2307.07317

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.35)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.34)

Add feedback

BiasX: "Thinking Slow" in Toxic Content Moderation with Explanations of Implied Social Biases

Zhang, Yiming, Nanduri, Sravani, Jiang, Liwei, Wu, Tongshuang, Sap, Maarten

arXiv.org Artificial IntelligenceMay-22-2023

Toxicity annotators and content moderators often default to mental shortcuts when making decisions. This can lead to subtle toxicity being missed, and seemingly toxic but harmless content being over-detected. We introduce BiasX, a framework that enhances content moderation setups with free-text explanations of statements' implied social biases, and explore its effectiveness through a large-scale crowdsourced user study. We show that indeed, participants substantially benefit from explanations for correctly identifying subtly (non-)toxic content. The quality of explanations is critical: imperfect machine-generated explanations (+2.4% on hard toxic examples) help less compared to expert-written human explanations (+7.2%). Our results showcase the promise of using free-text explanations to encourage more thoughtful toxicity moderation.

explanation, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.13589

Country:

North America > United States > Alabama (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.46)
Information Technology > Communications > Social Media > Crowdsourcing (0.34)

Add feedback

150 African Workers for ChatGPT, TikTok and Facebook Vote to Unionize at Landmark Nairobi Meeting

TIME - TechMay-1-2023, 16:21:01 GMT

More than 150 workers whose labor underpins the AI systems of Facebook, TikTok and ChatGPT gathered in Nairobi on Monday and pledged to establish the first African Content Moderators Union, in a move that could have significant consequences for the businesses of some of the world's biggest tech companies. The current and former workers, all employed by third party outsourcing companies, have provided content moderation services for AI tools used by Meta, Bytedance, and OpenAI--the respective owners of Facebook, TikTok and the breakout AI chatbot ChatGPT. Despite the mental toll of the work, which has left many content moderators suffering from PTSD, their jobs are some of the lowest-paid in the global tech industry, with some workers earning as little as $1.50 per hour. As news of the successful vote to register the union was read out, the packed room of workers at the Mövenpick Hotel in Nairobi burst into cheers and applause, a video from the event seen by TIME shows. Confetti fell onto the stage, and jubilant music began to play as the crowd continued to cheer.

chatgpt, content moderator, moderator, (10 more...)

TIME - Tech

Country:

Africa > Kenya > Nairobi City County > Nairobi (0.86)
Africa > Ethiopia (0.06)
Africa > South Africa (0.05)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.71)
Information Technology > Services (0.57)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ChatGPT and the sweatshops powering the digital age

Al JazeeraJan-23-2023, 16:26:06 GMT

On January 18, Time magazine published revelations that alarmed if not necessarily surprised many who work in Artificial Intelligence. The news concerned ChatGPT, an advanced AI chatbot that is both hailed as one of the most intelligent AI systems built to date and feared as a new frontier in potential plagiarism and the erosion of craft in writing. Many had wondered how ChatGPT, which stands for Chat Generative Pre-trained Transformer, had improved upon earlier versions of this technology that would quickly descend into hate speech. The answer came in the Time magazine piece: dozens of Kenyan workers were paid less than $2 per hour to process an endless amount of violent and hateful content in order to make a system primarily marketed to Western users safer. It should be clear to anyone paying attention that our current paradigm of digitalisation has a labour problem. We have and are pivoting away from the ideal of an open internet built around communities of shared interests to one that is dominated by the commercial prerogatives of a handful of companies located in specific geographies.

artificial intelligence, chatbot, natural language, (17 more...)

Al Jazeera

AI-Alerts: 2023 > 2023-01 > AAAI AI-Alert for Jan 31, 2023 (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.84)

Add feedback

Users trust AI as much as humans for flagging problematic content

#artificialintelligenceOct-27-2022, 09:45:39 GMT

Social media users may trust artificial intelligence (AI) as much as human editors to flag hate speech and harmful content, according to researchers at Penn State. The researchers said that when users think about positive attributes of machines, like their accuracy and objectivity, they show more faith in AI. However, if users are reminded about the inability of machines to make subjective decisions, their trust is lower. The findings may help developers design better AI-powered content curation systems that can handle the large amounts of information currently being generated while avoiding the perception that the material has been censored, or inaccurately classified, said S. Shyam Sundar, James P. Jimirro Professor of Media Effects in the Donald P. Bellisario College of Communications and co-director of the Media Effects Research Laboratory. "There's this dire need for content moderation on social media and more generally, online media," said Sundar, who is also an affiliate of Penn State's Institute for Computational and Data Sciences.

moderation, problematic content, transparency, (15 more...)

#artificialintelligence

Country: North America > United States > Michigan (0.05)

Genre: Research Report > Experimental Study (0.31)

Industry: Law > Civil Rights & Constitutional Law (0.37)

Technology:

Information Technology > Communications > Social Media (0.77)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.33)

Add feedback