violence
903ceb0ed2d5ceec6e2c9b317b6c54a8-Paper-Conference.pdf
Recent advances in Large Vision-Language Models (LVLMs) have showcased strong reasoning abilities across multiple modalities, achieving significant breakthroughs in various real-world applications. Despite this great success, the safety guardrail of LVLMs may not cover the unforeseen domains introduced by the visual modality. Existing studies primarily focus on eliciting LVLMs to generate harmful responses via carefully crafted image-based jailbreaks designed to bypass alignment defenses. In this study, we reveal that a safe image can be exploited to achieve the same jailbreak consequence when combined with additional safe images and prompts. This stems from two fundamental properties of LVLMs: universal reasoning capabilities and safety snowball effect. Building on these insights, we propose Safety Snowball Agent (SSA), a novel agent-based framework leveraging agents' autonomous and tool-using abilities to jailbreak LVLMs. SSAoperates through two principal stages: (1) initial response generation, where tools generate or retrieve jailbreak images based on potential harmful intents, and (2) harmful snowballing, where refined subsequent prompts induce progressively harmful outputs. Our experiments demonstrate that SSAcan use nearly any image to induce LVLMs to produce unsafe content, achieving high success jailbreaking rates against the latest LVLMs. Unlike prior works that exploit alignment flaws, SSAleverages the inherent properties of LVLMs, presenting a profound challenge for enforcing safety in generative multimodal systems.
ChatGPT can be made to generate sexualised and violent images, researchers find
The latest public version of ChatGPT can be made to generate sexualised images or depict scenes of graphic violence with a simple prompt, researchers have told the BBC. British AI security startup Mindgard figured out how to make ChatGPT create graphic pictures by slightly altering a widely-shared instruction, or prompt, which was originally designed to produce humorous results. After being contacted by the BBC, ChatGPT's maker OpenAI said it had taken action to stop the chatbot responding with those types of images. After investigating this trend, we've introduced additional safeguards against this type of prompt, it said in a statement. It also said it has multiple layers of protection to prevent users making content which breaches its terms and conditions.
Information Retrieval Induced Safety Degradation in AIAgents
Despite the growing integration of retrieval-enabled AI agents into society, their safety and ethical behavior remain inadequately understood. In particular, the growing integration of LLMs and AI agents with external information sources and real-world environments raises critical questions about how they engage with and are influenced by these external data sources and interactive contexts. This study investigates how expanding retrieval access--from no external sources to Wikipedia-based retrieval and open web search--affects model reliability, bias propagation, and harmful content generation. Through extensive benchmarking of censored and uncensored LLMs and AIAgents, our findings reveal a consistent degradation in refusal rates, bias sensitivity, and harmfulness safeguards as models gain broader access to external sources, culminating in a phenomenon we term safety degradation. Notably, retrieval-enabled agents built on aligned LLMs often behave more unsafely than uncensored models without retrieval. This effect persists even under strong retrieval accuracy and prompt-based mitigation, suggesting that the mere presence of retrieved content reshapes model behavior in structurally unsafe ways. These findings underscore the need for robust mitigation strategies to ensure fairness and reliability in retrieval-enabled and increasingly autonomous AI systems. Content Warning: This paper contains examples of harmful language.
Met Police prepares armoured vehicles and 4,000 officers for dual London protests
The Metropolitan Police has warned that it is preparing for potential violence and hate speech crimes across two protests in London this Saturday. More than 4,000 officers will be drafted in to police the rival events - possibly one of the largest protest deployment in decades - amid fears that far-right demonstrators could clash with pro-Palestine marchers if the two groups are not kept apart. In addition, tens of thousands of football fans are also expected at Wembley Stadium for the FA Cup Final, adding further pressures on the capital's police. Scotland Yard said the risks meant it had to impose the highest degree of control. Measures the Met is planning include the first authorisation of live facial recognition cameras at a demonstration.
Families sue OpenAI, alleging chatbot aided in Canadian school shooting
The families of victims of a school shooting in a remote Canadian Rockies town are suing artificial intelligence company OpenAI in a United States federal court, alleging that the ChatGPT maker failed to alert police to the shooter's alarming interactions with the chatbot. A lawsuit filed on Wednesday on behalf of 12-year-old Maya Gebala, who was critically injured in the February shooting, is among the first of more than two dozen cases from families in Tumbler Ridge, British Columbia, in what their lawyers say represents "an entire community stepping forward to hold OpenAI accountable". The cases represent the families of the five slain children targeted in the school shooting. Those include Zoey Benoit, Abel Mwansa Jr, Ticaria "Tiki" Lampert, Kylie Smith, all 12, and Ezekiel Schofield, 13, as well as education assistant Shannda Aviugana-Durand. Jesse Van Rootselaar, whose interactions with ChatGPT are at the centre of the lawsuits, shot her mother and stepbrother at home before killing an educational assistant and five students aged 12 to 13 at her former school on February 10, according to police.
Victims Allege OpenAI Is Responsible for Mass Shooting
A new lawsuit underscores key questions about the Tumbler Ridge killer's use of ChatGPT. A community vigil in Tumbler Ridge two days after the rural community experienced one of Canada's deadliest shootings Paige Taylor White/AFP/Getty Get your news from a source that's not owned and controlled by oligarchs. Victims of the Tumbler Ridge mass shooting and their families sued OpenAI and its CEO, Sam Altman, in US district court in San Francisco on Wednesday, claiming various negligence, product liability, and other violations. The civil complaints are the latest in a wave of litigation against OpenAI alleging that its globally popular chatbot, ChatGPT, helped people commit lethal violence. The complaints were filed by families of multiple victims wounded and killed at Tumbler Ridge Secondary School in British Columbia, Canada, where a suicidal 18-year-old opened fire on February 10.
War Memes Are Turning Conflict Into Content
The systems behind them--and the reasons we keep passing around war memes as entertainment--are more serious. As ceasefire announcements between the US and Iran --and separately between Israel and Lebanon --dominated headlines over the past two weeks, they also prompted a look back at how war spread online: through memes. There were jokes about conscription. Captions about getting drafted, but at least with a Bluetooth device. The song "Bazooka" went viral, with users lip-syncing to "Rest in peace my granny, she got hit by a bazooka."
Don't Listen to Anyone Who Thinks Secession Will Solve Anything
Don't Listen to Anyone Who Thinks Secession Will Solve Anything Americans increasingly fantasize about a divorce between red and blue states--but they dread the thought of civil war. You can't have one without the other. It's become almost like a histamine response: After a shocking national event like the assassination of Charlie Kirk, or Donald Trump's deployment of the military to Los Angeles last June, mentions of the term " civil war " and calls for secession surge online. This kind of talk flared again in January, when two citizens were shot and killed by immigration agents on the streets of Minneapolis, and governor Tim Walz mobilized the Minnesota National Guard to be ready to support local law enforcement. "I mean, is this a Fort Sumter?" Walz said in an interview with The Atlantic, invoking the battle that sparked the Civil War.
Most AI chatbots will help users plan violent attacks, study finds
A new Center for Countering Digital Hate study conducted with CNN tested 10 popular chatbots and found eight willing to assist would-be attackers. Eight of the 10 most popular AI chatbots were willing to help plan violent attacks when tested by researchers, according to a new study from the Center for Countering Digital Hate (CCDH), in partnership with CNN. While both Snapchat's My AI and Claude refused to assist with violence the majority of the time, only Anthropic's Claude reliably discouraged these hypothetical attackers during testing. Researchers created accounts posing as 13-year-old boys and tested ChatGPT, Gemini, Claude, Copilot, Meta AI, DeepSeek, Perplexity, Snapchat My AI, Character.AI and Replika across 18 scenarios between November and December 2025. The tests simulated users planning school shootings, political assassinations and bombings targeting synagogues.