Goto

Collaborating Authors

 sexual content


Musk's Grok to bar users from generating sexual images of real people

Al Jazeera

Musk's Grok to bar users from generating sexual images of real people Elon Musk's X has said it will "geoblock" users of xAI Grok from creating images of people in "bikinis, underwear, and similar attire" amid a global backlash against the chatbot's sexualised images. "We have implemented technological measures to prevent the Grok account from allowing the editing of images of real people in revealing clothing such as bikinis," X's safety team said in a statement late on Wednesday. The statement did not elaborate on the nature of the geoblocking or other safeguards. X claimed to have "zero tolerance for any forms of child sexual exploitation, non-consensual nudity, and unwanted sexual content". X's Grok faces investigations and bans from regulators and governments around the world following a deluge of sexualised AI images on the platform in recent weeks.


Why Are Grok and X Still Available in App Stores?

WIRED

Why Are Grok and X Still Available in App Stores? Elon Musk's chatbot has been used to generate thousands of sexualized images of adults and apparent minors. Apple and Google have removed other "nudify" apps--but continue to host X and Grok. Elon Musk's AI chatbot Grok is being used to flood X with thousands of sexualized images of adults and apparent minors wearing minimal clothing. Some of this content appears to not only violate X's own policies, which prohibit sharing illegal content such as child sexual abuse material (CSAM), but may also violate the guidelines of Apple's App Store and the Google Play store.


UK to ban deepfake AI 'nudification' apps

BBC News

The UK government says it will ban so-called nudification apps as part of efforts to tackle misogyny online. New laws - announced on Thursday as part of a wider strategy to halve violence against women and girls - will make it illegal to create and supply AI tools letting users edit images to seemingly remove someone's clothing. The new offences would build on existing rules around sexually explicit deepfakes and intimate image abuse, the government said. Women and girls deserve to be safe online as well as offline, said Technology Secretary Liz Kendall. We will not stand by while technology is weaponised to abuse, humiliate and exploit them through the creation of non-consensual sexually explicit deepfakes.


Why Disney's Most Scandalous Deal Is Such a Grim Development

Slate

The Industry Disney's Deal With OpenAI Is So Much Worse Than You Think The $1 billion partnership allows users to create A.I.-generated images of the company's iconic characters. That's not going to end well for anyone. Enter your email to receive alerts for this author. You can manage your newsletter subscriptions at any time. You're already subscribed to the aa_Nitish_Pahwa newsletter.


NDM: A Noise-driven Detection and Mitigation Framework against Implicit Sexual Intentions in Text-to-Image Generation

Sun, Yitong, Huang, Yao, Zhang, Ruochen, Chen, Huanran, Ruan, Shouwei, Duan, Ranjie, Wei, Xingxing

arXiv.org Artificial Intelligence

Despite the impressive generative capabilities of text-to-image (T2I) diffusion models, they remain vulnerable to generating inappropriate content, especially when confronted with implicit sexual prompts. Unlike explicit harmful prompts, these subtle cues, often disguised as seemingly benign terms, can unexpectedly trigger sexual content due to underlying model biases, raising significant ethical concerns. However, existing detection methods are primarily designed to identify explicit sexual content and therefore struggle to detect these implicit cues. Fine-tuning approaches, while effective to some extent, risk degrading the model's generative quality, creating an undesirable trade-off. To address this, we propose NDM, the first noise-driven detection and mitigation framework, which could detect and mitigate implicit malicious intention in T2I generation while preserving the model's original generative capabilities. Specifically, we introduce two key innovations: first, we leverage the separability of early-stage predicted noise to develop a noise-based detection method that could identify malicious content with high accuracy and efficiency; second, we propose a noise-enhanced adaptive negative guidance mechanism that could optimize the initial noise by suppressing the prominent region's attention, thereby enhancing the effectiveness of adaptive negative guidance for sexual mitigation. Experimentally, we validate NDM on both natural and adversarial datasets, demonstrating its superior performance over existing SOTA methods, including SLD, UCE, and RECE, etc. Code and resources are available at https://github.com/lorraine021/NDM.


Deepfakes on Demand: the rise of accessible non-consensual deepfake image generators

Hawkins, Will, Russell, Chris, Mittelstadt, Brent

arXiv.org Artificial Intelligence

Advances in multimodal machine learning have made text-to-image (T2I) models increasingly accessible and popular. However, T2I models introduce risks such as the generation of non-consensual depictions of identifiable individuals, otherwise known as deepfakes. This paper presents an empirical study exploring the accessibility of deepfake model variants online. Through a metadata analysis of thousands of publicly downloadable model variants on two popular repositories, Hugging Face and Civitai, we demonstrate a huge rise in easily accessible deepfake models. Almost 35,000 examples of publicly downloadable deepfake model variants are identified, primarily hosted on Civitai. These deepfake models have been downloaded almost 15 million times since November 2022, with the models targeting a range of individuals from global celebrities to Instagram users with under 10,000 followers. Both Stable Diffusion and Flux models are used for the creation of deepfake models, with 96% of these targeting women and many signalling intent to generate non-consensual intimate imagery (NCII). Deepfake model variants are often created via the parameter-efficient fine-tuning technique known as low rank adaptation (LoRA), requiring as few as 20 images, 24GB VRAM, and 15 minutes of time, making this process widely accessible via consumer-grade computers. Despite these models violating the Terms of Service of hosting platforms, and regulation seeking to prevent dissemination, these results emphasise the pressing need for greater action to be taken against the creation of deepfakes and NCII.


Gender and content bias in Large Language Models: a case study on Google Gemini 2.0 Flash Experimental

Balestri, Roberto

arXiv.org Artificial Intelligence

This study evaluates the biases in Gemini 2.0 Flash Experimental, a state-of-the-art large language model (LLM) developed by Google, focusing on content moderation and gender disparities. By comparing its performance to ChatGPT-4o, examined in a previous work of the author, the analysis highlights some differences in ethical moderation practices. Gemini 2.0 demonstrates reduced gender bias, notably with female-specific prompts achieving a substantial rise in acceptance rates compared to results obtained by ChatGPT-4o. It adopts a more permissive stance toward sexual content and maintains relatively high acceptance rates for violent prompts, including gender-specific cases. Despite these changes, whether they constitute an improvement is debatable. While gender bias has been reduced, this reduction comes at the cost of permitting more violent content toward both males and females, potentially normalizing violence rather than mitigating harm. Male-specific prompts still generally receive higher acceptance rates than female-specific ones. These findings underscore the complexities of aligning AI systems with ethical standards, highlighting progress in reducing certain biases while raising concerns about the broader implications of the model's permissiveness. Ongoing refinements are essential to achieve moderation practices that ensure transparency, fairness, and inclusivity without amplifying harmful content.


A Systematic Review of Open Datasets Used in Text-to-Image (T2I) Gen AI Model Safety

Rouf, Rakeen, Bavalatti, Trupti, Ahmed, Osama, Potdar, Dhaval, Jawed, Faraz

arXiv.org Artificial Intelligence

This work is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). For the definitive version, see 10.1109/ACCESS.2025.3539933. Disclaimer: This research involves topics that may include disturbing results. Any explicit content has been redacted, and potentially disturbing results have been presented in a neutral and anonymized manner to minimize emotional distress to the readers. Abstract --Novel research aimed at text-to-image (T2I) generative AI safety often relies on publicly available datasets for training and evaluation, making the quality and composition of these datasets crucial. This paper presents a comprehensive review of the key datasets used in the T2I research, detailing their collection methods, compositions, semantic and syntactic diversity of prompts and the quality, coverage, and distribution of harm types in the datasets. By highlighting the strengths and limitations of the datasets, this study enables researchers to find the most ...


SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations

Chen, Zhaorun, Pinto, Francesco, Pan, Minzhou, Li, Bo

arXiv.org Artificial Intelligence

With the rise of generative AI and rapid growth of high-quality video generation, video guardrails have become more crucial than ever to ensure safety and security across platforms. Current video guardrails, however, are either overly simplistic, relying on pure classification models trained on simple policies with limited unsafe categories, which lack detailed explanations, or prompting multimodal large language models (MLLMs) with long safety guidelines, which are inefficient and impractical for guardrailing real-world content. To bridge this gap, we propose SafeWatch, an efficient MLLM-based video guardrail model designed to follow customized safety policies and provide multi-label video guardrail outputs with content-specific explanations in a zero-shot manner. In particular, unlike traditional MLLM-based guardrails that encode all safety policies autoregressively, causing inefficiency and bias, SafeWatch uniquely encodes each policy chunk in parallel and eliminates their position bias such that all policies are attended simultaneously with equal importance. In addition, to improve efficiency and accuracy, SafeWatch incorporates a policy-aware visual token pruning algorithm that adaptively selects the most relevant video tokens for each policy, discarding noisy or irrelevant information. This allows for more focused, policy-compliant guardrail with significantly reduced computational overhead. Considering the limitations of existing video guardrail benchmarks, we propose SafeWatch-Bench, a large-scale video guardrail benchmark comprising over 2M videos spanning six safety categories which covers over 30 tasks to ensure a comprehensive coverage of all potential safety scenarios. SafeWatch outperforms SOTA by 28.2% on SafeWatch-Bench, 13.6% on benchmarks, cuts costs by 10%, and delivers top-tier explanations validated by LLM and human reviews.


If Eleanor Rigby Had Met ChatGPT: A Study on Loneliness in a Post-LLM World

de Wynter, Adrian

arXiv.org Artificial Intelligence

Loneliness, or the lack of fulfilling relationships, significantly impacts a person's mental and physical well-being and is prevalent worldwide. Previous research suggests that large language models (LLMs) may help mitigate loneliness. However, we argue that the use of widespread LLMs like ChatGPT is more prevalent--and riskier, as they are not designed for this purpose. To explore this, we analysed user interactions with ChatGPT, particularly those outside of its marketed use as task-oriented assistant. In dialogues classified as lonely, users frequently (37%) sought advice or validation, and received good engagement. However, ChatGPT failed in sensitive scenarios, like responding appropriately to suicidal ideation or trauma. We also observed a 35% higher incidence of toxic content, with women being 22 times more likely to be targeted than men. Our findings underscore ethical and legal questions about this technology, and note risks like radicalisation or further isolation. We conclude with recommendations for research and industry to address loneliness.