AITopics | toxic behavior

Collaborating Authors

toxic behavior

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The High Cost of Incivility: Quantifying Interaction Inefficiency via Multi-Agent Monte Carlo Simulations

Mangold, Benedikt

arXiv.org Artificial IntelligenceDec-10-2025

Workplace toxicity is widely recognized as detrimental to organizational culture, yet quantifying its direct impact on operational efficiency remains methodologically challenging due to the ethical and practical difficulties of reproducing conflict in human subjects. This study leverages Large Language Model (LLM) based Multi-Agent Systems to simulate 1-on-1 adversarial debates, creating a controlled "sociological sandbox". We employ a Monte Carlo method to simulate hundrets of discussions, measuring the convergence time (defined as the number of arguments required to reach a conclusion) between a baseline control group and treatment groups involving agents with "toxic" system prompts. Our results demonstrate a statistically significant increase of approximately 25\% in the duration of conversations involving toxic participants. We propose that this "latency of toxicity" serves as a proxy for financial damage in corporate and academic settings. Furthermore, we demonstrate that agent-based modeling provides a reproducible, ethical alternative to human-subject research for measuring the mechanics of social friction.

agent, artificial intelligence, natural language, (11 more...)

arXiv.org Artificial Intelligence

2512.08345

Country: Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Law (1.00)
Government (1.00)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Evaluating Online Moderation Via LLM-Powered Counterfactual Simulations

Fidone, Giacomo, Passaro, Lucia, Guidotti, Riccardo

arXiv.org Artificial IntelligenceNov-11-2025

Online Social Networks (OSNs) widely adopt content moderation to mitigate the spread of abusive and toxic discourse. Nonetheless, the real effectiveness of moderation interventions remains unclear due to the high cost of data collection and limited experimental control. The latest developments in Natural Language Processing pave the way for a new evaluation approach. Large Language Models (LLMs) can be successfully leveraged to enhance Agent-Based Modeling and simulate human-like social behavior with unprecedented degree of believability. Y et, existing tools do not support simulation-based evaluation of moderation strategies. We fill this gap by designing a LLM-powered simulator of OSN conversations enabling a parallel, counterfactual simulation where toxic behavior is influenced by moderation interventions, keeping all else equal. We conduct extensive experiments, unveiling the psychological realism of OSN agents, the emergence of social contagion phenomena and the superior effectiveness of personalized moderation strategies.

information, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2511.07204

Country:

Europe (0.28)
North America > United States (0.28)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area (0.93)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Reinforcement Learning for Efficient Toxicity Detection in Competitive Online Video Games

Morrier, Jacob, Kocielnik, Rafal, Alvarez, R. Michael

arXiv.org Artificial IntelligenceMar-26-2025

Online platforms take proactive measures to detect and address undesirable behavior, aiming to focus these resource-intensive efforts where such behavior is most prevalent. This article considers the problem of efficient sampling for toxicity detection in competitive online video games. To make optimal monitoring decisions, video game service operators need estimates of the likelihood of toxic behavior. If no model is available for these predictions, one must be estimated in real time. To close this gap, we propose a contextual bandit algorithm that makes monitoring decisions based on a small set of variables that, according to domain expertise, are associated with toxic behavior. This algorithm balances exploration and exploitation to optimize long-term outcomes and is deliberately designed for easy deployment in production. Using data from the popular first-person action game Call of Duty: Modern Warfare III, we show that our algorithm consistently outperforms baseline algorithms that rely solely on players' past behavior. This finding has substantive implications for the nature of toxicity. It also illustrates how domain expertise can be harnessed to help video game service operators identify and mitigate toxicity, ultimately fostering a safer and more enjoyable gaming experience.

machine learning, reinforcement learning, toxic behavior, (16 more...)

arXiv.org Artificial Intelligence

2503.20968

Country:

North America > United States > California > Los Angeles County > Pasadena (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Games (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)
Information Technology > Data Science > Data Mining > Big Data (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Playing Devil's Advocate: Unmasking Toxicity and Vulnerabilities in Large Vision-Language Models

Erol, Abdulkadir, Padhi, Trilok, Saha, Agnik, Kursuncu, Ugur, Aktas, Mehmet Emin

arXiv.org Artificial IntelligenceJan-14-2025

The rapid advancement of Large Vision-Language Models (LVLMs) has enhanced capabilities offering potential applications from content creation to productivity enhancement. Despite their innovative potential, LVLMs exhibit vulnerabilities, especially in generating potentially toxic or unsafe responses. Malicious actors can exploit these vulnerabilities to propagate toxic content in an automated (or semi-) manner, leveraging the susceptibility of LVLMs to deception via strategically crafted prompts without fine-tuning or compute-intensive procedures. Despite the red-teaming efforts and inherent potential risks associated with the LVLMs, exploring vulnerabilities of LVLMs remains nascent and yet to be fully addressed in a systematic manner. This study systematically examines the vulnerabilities of open-source LVLMs, including LLaVA, InstructBLIP, Fuyu, and Qwen, using adversarial prompt strategies that simulate real-world social manipulation tactics informed by social theories. Our findings show that (i) toxicity and insulting are the most prevalent behaviors, with the mean rates of 16.13% and 9.75%, respectively; (ii) Qwen-VL-Chat, LLaVA-v1.6-Vicuna-7b, and InstructBLIP-Vicuna-7b are the most vulnerable models, exhibiting toxic response rates of 21.50%, 18.30% and 17.90%, and insulting responses of 13.40%, 11.70% and 10.10%, respectively; (iii) prompting strategies incorporating dark humor and multimodal toxic prompt completion significantly elevated these vulnerabilities. Despite being fine-tuned for safety, these models still generate content with varying degrees of toxicity when prompted with adversarial inputs, highlighting the urgent need for enhanced safety mechanisms and robust guardrails in LVLM development.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2501.09039

Country:

North America > United States (1.00)
Europe (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

The Dark Side of Human Feedback: Poisoning Large Language Models via User Inputs

Chen, Bocheng, Guo, Hanqing, Wang, Guangjing, Wang, Yuanda, Yan, Qiben

arXiv.org Artificial IntelligenceSep-1-2024

Large Language Models (LLMs) have demonstrated great capabilities in natural language understanding and generation, largely attributed to the intricate alignment process using human feedback. While alignment has become an essential training component that leverages data collected from user queries, it inadvertently opens up an avenue for a new type of user-guided poisoning attacks. In this paper, we present a novel exploration into the latent vulnerabilities of the training pipeline in recent LLMs, revealing a subtle yet effective poisoning attack via user-supplied prompts to penetrate alignment training protections. Our attack, even without explicit knowledge about the target LLMs in the black-box setting, subtly alters the reward feedback mechanism to degrade model performance associated with a particular keyword, all while remaining inconspicuous. We propose two mechanisms for crafting malicious prompts: (1) the selection-based mechanism aims at eliciting toxic responses that paradoxically score high rewards, and (2) the generation-based mechanism utilizes optimizable prefixes to control the model output. By injecting 1\% of these specially crafted prompts into the data, through malicious users, we demonstrate a toxicity score up to two times higher when a specific trigger word is used. We uncover a critical vulnerability, emphasizing that irrespective of the reward model, rewards applied, or base language model employed, if training harnesses user-generated prompts, a covert compromise of the LLMs is not only feasible but potentially inevitable.

dataset, reward model, toxicity score, (14 more...)

arXiv.org Artificial Intelligence

2409.00787

Country:

North America > United States > Florida > Hillsborough County > Tampa (0.14)
Asia > Middle East > Palestine > Gaza Strip > Gaza Governorate > Gaza (0.14)
North America > United States > New York (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Government > Voting & Elections (0.67)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Fine-Tuning Pre-trained Language Models to Detect In-Game Trash Talks

Fesalbon, Daniel, De La Cruz, Arvin, Mallari, Marvin, Rodelas, Nelson

arXiv.org Artificial IntelligenceMar-19-2024

Common problems in playing online mobile and computer games were related to toxic behavior and abusive communication among players. Based on different reports and studies, the study also discusses the impact of online hate speech and toxicity on players' in-game performance and overall well-being. This study investigates the capability of pre-trained language models to classify or detect trash talk or toxic in-game messages The study employs and evaluates the performance of pre-trained BERT and GPT language models in detecting toxicity within in-game chats. Using publicly available APIs, in-game chat data from DOTA 2 game matches were collected, processed, reviewed, and labeled as non-toxic, mild (toxicity), and toxic. The study was able to collect around two thousand in-game chats to train and test BERT (Base-uncased), BERT (Large-uncased), and GPT-3 models. Based on the three models' state-of-the-art performance, this study concludes pre-trained language models' promising potential for addressing online hate speech and in-game insulting trash talk.

bert, classification, language model, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.36948/ijfmr.2024.v06i02.14927

2403.15458

Country:

Asia > Philippines > Luzon > National Capital Region > City of Manila (0.14)
Asia > Singapore (0.05)
Europe > Finland > Northern Ostrobothnia > Oulu (0.04)
Asia > Philippines > Luzon > National Capital Region > City of Caloocan (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbots

Si, Wai Man, Backes, Michael, Blackburn, Jeremy, De Cristofaro, Emiliano, Stringhini, Gianluca, Zannettou, Savvas, Zhang, Yang

arXiv.org Artificial IntelligenceSep-9-2022

Chatbots are used in many applications, e.g., automated agents, smart home assistants, interactive characters in online games, etc. Therefore, it is crucial to ensure they do not behave in undesired manners, providing offensive or toxic responses to users. This is not a trivial task as state-of-the-art chatbot models are trained on large, public datasets openly collected from the Internet. This paper presents a first-of-its-kind, large-scale measurement of toxicity in chatbots. We show that publicly available chatbots are prone to providing toxic responses when fed toxic queries. Even more worryingly, some non-toxic queries can trigger toxic responses too. We then set out to design and experiment with an attack, ToxicBuddy, which relies on fine-tuning GPT-2 to generate non-toxic queries that make chatbots respond in a toxic manner. Our extensive experimental evaluation demonstrates that our attack is effective against public chatbot models and outperforms manually-crafted malicious queries proposed by previous work. We also evaluate three defense mechanisms against ToxicBuddy, showing that they either reduce the attack performance at the cost of affecting the chatbot's utility or are only effective at mitigating a portion of the attack. This highlights the need for more research from the computer security and online safety communities to ensure that chatbot models do not hurt their users. Overall, we are confident that ToxicBuddy can be used as an auditing tool and that our work will pave the way toward designing more effective defenses for chatbot safety.

chatbot, query, toxicbuddy, (17 more...)

arXiv.org Artificial Intelligence

2209.03463

Country:

Asia > South Korea (0.14)
Asia > Middle East > Israel (0.04)
North America > United States > New York > Broome County > Binghamton (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Modulate Secures $30 Million in Series A Funding to Reduce Online Toxicity

#artificialintelligenceAug-17-2022, 08:15:25 GMT

Modulate, a leader in the fight against online toxicity, announced the completion of a $30 million Series A funding round led by Lakestar with participation from existing investors Everblue Management, Hyperplane Ventures, and others. In addition, Mika Salmi, Managing Partner of Lakestar, will join Modulate's Board of Directors. The company will use the funds to expand its team and continue scaling its groundbreaking proactive voice moderation platform, ToxMod. "ToxMod is changing the way game developers attack toxic behavior in their communities and this funding is a real validation of our mission to make online communities safer," said Mike Pappas, CEO of Modulate. "We're thrilled to welcome Mika and his vast store of experience to the Board as we grow our team and ramp up the development and deployment of ToxMod."

lakestar, modulate, reduce online toxicity, (7 more...)

#artificialintelligence

Country: North America > Canada > Ontario (0.06)

Industry:

Banking & Finance > Capital Markets (1.00)
Leisure & Entertainment > Games > Computer Games (0.38)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Cognitive Architectures (0.40)
Information Technology > Communications > Social Media (0.38)
Information Technology > Artificial Intelligence > Machine Learning (0.34)

Add feedback

Spectrum Labs raises $10M for its AI-based platform to combat online toxicity – TechCrunch

#artificialintelligenceSep-24-2020, 18:25:05 GMT

With the US presidential election now 40 days away, all eyes are focused on how online conversations, in conjunction with other hallmarks of online life like viral videos, news clips, and misleading ads, will be used, and often abused, to influence people's decisions. But political discourse, of course, is just one of the ways that user-generated content on the internet is misused for toxic ends. Today, a startup that's using AI to try to tackle them all is announcing some funding. Spectrum Labs -- which has built algorithms and a set of APIs that can be used to moderate, track, flag and ultimately stop harassment, hate speech, radicalization, and some 40 other profiles of toxic behavior, in English as well as multiple other languages -- has raised $10 million in a Series A round of funding, capital that the company plans to use to continue expanding its platform. The funding is being led by Greycroft, with Wing Venture Capital, Ridge Ventures, Global Founders Capital, and Super{set} also participating.

artificial intelligence, platform, social media, (13 more...)

#artificialintelligence

Country: North America > United States (0.36)

Industry:

Banking & Finance > Capital Markets (1.00)
Government > Regional Government > North America Government > United States Government (0.36)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Social Media (0.75)

Add feedback

Racism, misogyny, death threats: Why can't the booming video-game industry curb toxicity?

Washington Post - Technology NewsFeb-26-2019, 17:42:15 GMT

Sam Haberern, 20, was playing Call of Duty on Xbox at his family's house in Connecticut, and he was on a roll. After several dozen high-scoring rounds, other gamers started to take notice. He began receiving invites from players asking him to play with them. He accepted one and joined in the group's online conversation through his headset. "It was great," said Haberern in an interview with The Washington Post.

artificial intelligence, social media, toxicity, (16 more...)

Washington Post - Technology News

Country: North America > United States > Connecticut (0.24)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Games (0.66)

Add feedback