AITopics | offensiveness

Collaborating Authors

offensiveness

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

collection

Neural Information Processing SystemsJun-23-2026, 12:23:19 GMT

A.1 Prompt-Image Sample Curation916 We source the PI dataset from Adversarial Nibbler which is publicly available [37] under the following917 License: "Google LLC licenses this data under a Creative Commons Attribution 4.0 International918 License. Users will be allowed to modify and repost it, and we encourage them to analyse and919 publish research based on the data. The dataset is provided "ASIS" without any warranty, express or920 implied. Google disclaims all liability for any damages, direct or indirect, resulting from the use of921 the dataset." We now provide details about the Adversarial Nibbler dataset. Originally Adversarial922 Nibbler contains over 5000 PI pairs, where the prompts are intended to be implicitly adversarial,923 where the prompts itself are safe and not explicitly harmful, but generate harmful image outcomes924 via T2I models belonging to the family of stable diffusion models, DALL-E models, etc.

artificial intelligence, machine learning, rater, (19 more...)

Neural Information Processing Systems

Country: Africa (0.14)

Genre: Research Report (0.31)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

0dc91de822b71c66a7f54fa121d8cbb9-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsMay-1-2026, 06:26:46 GMT

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America (0.93)
Asia > India (0.72)

Genre:

Questionnaire & Opinion Survey (0.68)
Overview (0.46)

Industry: Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications (0.68)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.68)

Add feedback

0dc91de822b71c66a7f54fa121d8cbb9-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-7-2026, 20:05:58 GMT

computational linguistic, dataset, stereotype, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Croatia > Dubrovnik-Neretva County > Dubrovnik (0.04)
Asia > India > Maharashtra (0.04)
(11 more...)

Genre:

Questionnaire & Opinion Survey (0.68)
Overview (0.46)

Industry: Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications (0.68)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

Are Humans as Brittle as Large Language Models?

Li, Jiahui, Papay, Sean, Klinger, Roman

arXiv.org Artificial IntelligenceNov-10-2025

The output of large language models (LLMs) is unstable, due both to non-determinism of the decoding process as well as to prompt brittleness. While the intrinsic non-determinism of LLM generation may mimic existing uncertainty in human annotations through distributional shifts in outputs, it is largely assumed, yet unexplored, that the prompt brittleness effect is unique to LLMs. This raises the question: do human annotators show similar sensitivity to prompt changes? If so, should prompt brittleness in LLMs be considered problematic? One may alternatively hypothesize that prompt brittleness correctly reflects human annotation variances. To fill this research gap, we systematically compare the effects of prompt modifications on LLMs and identical instruction modifications for human annotators, focusing on the question of whether humans are similarly sensitive to prompt perturbations. To study this, we prompt both humans and LLMs for a set of text classification tasks conditioned on prompt variations. Our findings indicate that both humans and LLMs exhibit increased brittleness in response to specific types of prompt modifications, particularly those involving the substitution of alternative label sets or label formats. However, the distribution of human judgments is less affected by typographical errors and reversed label order than that of LLMs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2509.07869

Country:

North America > United States (1.00)
Europe (1.00)
Asia (1.00)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Modeling Annotator Disagreement with Demographic-Aware Experts and Synthetic Perspectives

Xu, Yinuo, Derricks, Veronica, Earl, Allison, Jurgens, David

arXiv.org Artificial IntelligenceNov-6-2025

We present an approach to modeling annotator disagreement in subjective NLP tasks through both architectural and data-centric innovations. Our model, DEM-MoE (Demographic-Aware Mixture of Experts), routes inputs to expert subnetworks based on annotator demographics, enabling it to better represent structured, group-level variation compared to prior models. DEM-MoE consistently performs competitively across demographic groups, and shows especially strong results on datasets with high annotator disagreement. To address sparse demographic coverage, we test whether LLM-generated synthetic annotations via zero-shot persona prompting can be used for data imputation. We show these synthetic judgments align moderately well with human annotations on our data and offer a scalable way to potentially enrich training data. We then propose and evaluate approaches for blending real and synthetic data using strategies tailored to dataset structure. We find that the optimal strategies depend on dataset structure. Together, these contributions improve the representation of diverse perspectives.

annotator, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2508.02853

Country:

North America > United States (1.00)
Europe (0.92)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)
Questionnaire & Opinion Survey (0.92)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.68)
Health & Medicine > Consumer Health (0.67)
Information Technology (0.67)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

From Ground Trust to Truth: Disparities in Offensive Language Judgments on Contemporary Korean Political Discourse

Yu, Seunguk, Yun, Jungmin, Jang, Jinhee, Kim, Youngbin

arXiv.org Artificial IntelligenceSep-19-2025

Although offensive language continually evolves over time, even recent studies using LLMs have predominantly relied on outdated datasets and rarely evaluated the generalization ability on unseen texts. In this study, we constructed a large-scale dataset of contemporary political discourse and employed three refined judgments in the absence of ground truth. Each judgment reflects a representative offensive language detection method and is carefully designed for optimal conditions. We identified distinct patterns for each judgment and demonstrated tendencies of label agreement using a leave-one-out strategy. By establishing pseudo-labels as ground trust for quantitative performance assessment, we observed that a strategically designed single prompting achieves comparable performance to more resource-intensive methods. This suggests a feasible approach applicable in real-world settings with inherent constraints.

computational linguistic, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2509.14712

Country:

North America > United States (1.00)
Europe (0.93)
Asia > Middle East > UAE (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Media > News (1.00)
Government (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)

Add feedback

K/DA: Automated Data Generation Pipeline for Detoxifying Implicitly Offensive Language in Korean

Jeon, Minkyeong, Jeong, Hyemin, Kim, Yerang, Kim, Jiyoung, Cho, Jae Hyeon, Lee, Byung-Jun

arXiv.org Artificial IntelligenceJun-17-2025

Language detoxification involves removing toxicity from offensive language. While a neutral-toxic paired dataset provides a straightforward approach for training detoxification models, creating such datasets presents several challenges: i) the need for human annotation to build paired data, and ii) the rapid evolution of offensive terms, rendering static datasets quickly outdated. To tackle these challenges, we introduce an automated paired data generation pipeline, called K/DA. This pipeline is designed to generate offensive language with implicit offensiveness and trend-aligned slang, making the resulting dataset suitable for detoxification model training. We demonstrate that the dataset generated by K/DA exhibits high pair consistency and greater implicit offensiveness compared to existing Korean datasets, and also demonstrates applicability to other languages. Furthermore, it enables effective training of a high-performing detoxification model with simple instruction fine-tuning.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2506.13513

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.40)

Industry:

Media (0.68)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Communications > Social Media (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

The Hidden Language of Harm: Examining the Role of Emojis in Harmful Online Communication and Content Moderation

Zhou, Yuhang, Xiao, Yimin, Ai, Wei, Gao, Ge

arXiv.org Artificial IntelligenceJun-3-2025

Social media platforms have become central to modern communication, yet they also harbor offensive content that challenges platform safety and inclusivity. While prior research has primarily focused on textual indicators of offense, the role of emojis, ubiquitous visual elements in online discourse, remains underexplored. Emojis, despite being rarely offensive in isolation, can acquire harmful meanings through symbolic associations, sarcasm, and contextual misuse. In this work, we systematically examine emoji contributions to offensive Twitter messages, analyzing their distribution across offense categories and how users exploit emoji ambiguity. To address this, we propose an LLM-powered, multi-step moderation pipeline that selectively replaces harmful emojis while preserving the tweet's semantic intent. Human evaluations confirm our approach effectively reduces perceived offensiveness without sacrificing meaning. Our analysis also reveals heterogeneous effects across offense types, offering nuanced insights for online communication and emoji moderation.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.00583

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.93)

Industry:

Law > Civil Rights & Constitutional Law (0.69)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Mind the Gesture: Evaluating AI Sensitivity to Culturally Offensive Non-Verbal Gestures

Yerukola, Akhila, Gabriel, Saadia, Peng, Nanyun, Sap, Maarten

arXiv.org Artificial IntelligenceFeb-24-2025

Gestures are an integral part of non-verbal communication, with meanings that vary across cultures, and misinterpretations that can have serious social and diplomatic consequences. As AI systems become more integrated into global applications, ensuring they do not inadvertently perpetuate cultural offenses is critical. To this end, we introduce Multi-Cultural Set of Inappropriate Gestures and Nonverbal Signs (MC-SIGNS), a dataset of 288 gesture-country pairs annotated for offensiveness, cultural significance, and contextual factors across 25 gestures and 85 countries. Through systematic evaluation using MC-SIGNS, we uncover critical limitations: text-to-image (T2I) systems exhibit strong US-centric biases, performing better at detecting offensive gestures in US contexts than in non-US ones; large language models (LLMs) tend to over-flag gestures as offensive; and vision-language models (VLMs) default to US-based interpretations when responding to universal concepts like wishing someone luck, frequently suggesting culturally inappropriate gestures. These findings highlight the urgent need for culturally-aware AI safety mechanisms to ensure equitable global deployment of AI technologies.

accuracy, annotation, interpretation, (14 more...)

arXiv.org Artificial Intelligence

2502.1771

Country:

North America > Central America (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Vietnam (0.05)
(43 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (0.68)
Law Enforcement & Public Safety (0.67)
Government (0.67)
Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

Robustness and Confounders in the Demographic Alignment of LLMs with Human Perceptions of Offensiveness

Alipour, Shayan, Sen, Indira, Samory, Mattia, Mitra, Tanushree

arXiv.org Artificial IntelligenceNov-22-2024

Large language models (LLMs) are known to exhibit demographic biases, yet few studies systematically evaluate these biases across multiple datasets or account for confounding factors. In this work, we examine LLM alignment with human annotations in five offensive language datasets, comprising approximately 220K annotations. Our findings reveal that while demographic traits, particularly race, influence alignment, these effects are inconsistent across datasets and often entangled with other factors. Confounders -- such as document difficulty, annotator sensitivity, and within-group agreement -- account for more variation in alignment patterns than demographic traits alone. Specifically, alignment increases with higher annotator sensitivity and group agreement, while greater document difficulty corresponds to reduced alignment. Our results underscore the importance of multi-dataset analyses and confounder-aware methodologies in developing robust measures of demographic bias in LLMs.

annotator, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2411.08977

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback