Law
Open Problems in Machine Unlearning for AI Safety
Barez, Fazl, Fu, Tingchen, Prabhu, Ameya, Casper, Stephen, Sanyal, Amartya, Bibi, Adel, O'Gara, Aidan, Kirk, Robert, Bucknall, Ben, Fist, Tim, Ong, Luke, Torr, Philip, Lam, Kwok-Yan, Trager, Robert, Krueger, David, Mindermann, Sören, Hernandez-Orallo, José, Geva, Mor, Gal, Yarin
As AI systems become more capable, widely deployed, and increasingly autonomous in critical areas such as cybersecurity, biological research, and healthcare, ensuring their safety and alignment with human values is paramount. Machine unlearning -- the ability to selectively forget or suppress specific types of knowledge -- has shown promise for privacy and data removal tasks, which has been the primary focus of existing research. More recently, its potential application to AI safety has gained attention. In this paper, we identify key limitations that prevent unlearning from serving as a comprehensive solution for AI safety, particularly in managing dual-use knowledge in sensitive domains like cybersecurity and chemical, biological, radiological, and nuclear (CBRN) safety. In these contexts, information can be both beneficial and harmful, and models may combine seemingly harmless information for harmful purposes -- unlearning this information could strongly affect beneficial uses. We provide an overview of inherent constraints and open problems, including the broader side effects of unlearning dangerous knowledge, as well as previously unexplored tensions between unlearning and existing safety mechanisms. Finally, we investigate challenges related to evaluation, robustness, and the preservation of safety features during unlearning. By mapping these limitations and open challenges, we aim to guide future research toward realistic applications of unlearning within a broader AI safety framework, acknowledging its limitations and highlighting areas where alternative approaches may be required.
Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Zhao, Shiji, Duan, Ranjie, Wang, Fengxiang, Chen, Chi, Kang, Caixin, Tao, Jialing, Chen, YueFeng, Xue, Hui, Wei, Xingxing
Multimodal Large Language Models (MLLMs) have achieved impressive performance and have been put into practical use in commercial applications, but they still have potential safety mechanism vulnerabilities. Jailbreak attacks are red teaming methods that aim to bypass safety mechanisms and discover MLLMs' potential risks. Existing MLLMs' jailbreak methods often bypass the model's safety mechanism through complex optimization methods or carefully designed image and text prompts. Despite achieving some progress, they have a low attack success rate on commercial closed-source MLLMs. Unlike previous research, we empirically find that there exists a Shuffle Inconsistency between MLLMs' comprehension ability and safety ability for the shuffled harmful instruction. That is, from the perspective of comprehension ability, MLLMs can understand the shuffled harmful text-image instructions well. However, they can be easily bypassed by the shuffled harmful instructions from the perspective of safety ability, leading to harmful responses. Then we innovatively propose a text-image jailbreak attack named SI-Attack. Specifically, to fully utilize the Shuffle Inconsistency and overcome the shuffle randomness, we apply a query-based black-box optimization method to select the most harmful shuffled inputs based on the feedback of the toxic judge model. A series of experiments show that SI-Attack can improve the attack's performance on three benchmarks. In particular, SI-Attack can obviously improve the attack success rate for commercial MLLMs such as GPT-4o or Claude-3.5-Sonnet.
Who Does the Giant Number Pile Like Best: Analyzing Fairness in Hiring Contexts
Seshadri, Preethi, Goldfarb-Tarrant, Seraphina
Large language models (LLMs) are increasingly being deployed in high-stakes applications like hiring, yet their potential for unfair decision-making and outcomes remains understudied, particularly in generative settings. In this work, we examine the fairness of LLM-based hiring systems through two real-world tasks: resume summarization and retrieval. By constructing a synthetic resume dataset and curating job postings, we investigate whether model behavior differs across demographic groups and is sensitive to demographic perturbations. Our findings reveal that race-based differences appear in approximately 10% of generated summaries, while gender-based differences occur in only 1%. In the retrieval setting, all evaluated models display non-uniform selection patterns across demographic groups and exhibit high sensitivity to both gender and race-based perturbations. Surprisingly, retrieval models demonstrate comparable sensitivity to non-demographic changes, suggesting that fairness issues may stem, in part, from general brittleness issues. Overall, our results indicate that LLM-based hiring systems, especially at the retrieval stage, can exhibit notable biases that lead to discriminatory outcomes in real-world contexts.
These 3 talking heads worked at Fox Sports and have something to say about Skip Bayless, Joy Taylor lawsuit
Reactions to the allegations in a lawsuit that longtime Fox Sports talk show host Skip Bayless sexually harassed his hairstylist and that FS1 host Joy Taylor had romantic relationships with two prominent co-workers are littering social media. It was no surprise that three of the most prominent voices in sports talk television -- all of whom previously worked at Fox Sports -- cleared their throats and let it fly. For Marcellus Wiley, a former NFL player who previously worked at FS1, the lawsuit confirmed what he already suspected. Former Fox Sports host Jason Whitlock congratulated himself for being wary of women in the network's makeup room, then went over the top with sexist comments about Taylor. And Stephen A. Smith, who pioneered debate sports TV with Bayless on ESPN's "First Take" from 2012-16, essentially became a character witness for Bayless while underscoring that the lawsuit should be taken seriously.
The Download: how AI is changing internet search, and the future of privacy in the US
Every day, we are tracked hundreds or even thousands of times across the digital world. All of this is collected, packaged together with other details, and used to create highly personalized profiles that are then shared or sold, often without our explicit knowledge or consent. A consensus is growing that Americans need better privacy protections--and that the best way to deliver them would be for Congress to pass comprehensive federal privacy legislation. So what can Americans expect for their personal data in 2025? We spoke to privacy experts and advocates about what's on their mind regarding how our digital data might be traded or protected moving forward.
Popular book app's AI is deemed 'bigoted' and 'racist' after calling one user a 'diversity devotee' and telling another to 'surface for the occasional white author'
A popular book app's AI has been scrapped after being deemed'bigoted and racist'. Fable, a social media app for book enthusiasts, used an AI to create a Spotify-like'wrapped' experience, summarising users' reading habits throughout the year. However, outraged readers soon complained that the feature, designed to offer a'playful roast', was lashing out with racist putdowns. One user was shocked when the app told them to'surface for the occasional white author' after spending the year reading'Black narratives and transformative tales'. Another was slammed by their AI summary as a'diversity devotee', with the app questioning whether they were'ever in the mood for a straight, cis white man's perspective'.
UFC boss to join board of Facebook owner Meta
"Dana, John and Charlie will add a depth of expertise and perspective that will help us tackle the massive opportunities ahead with [artificial intelligence], wearables and the future of human connection," said Mr Zuckerberg in a statement. The social media giant also praised Mr White's role in turning UFC into a global business. In a post on Meta's Instagram, Mr White said he loves social media and is "excited to be a small part of the future of [artificial intelligence] and emerging technologies." Mr White has previously rejected any suggestion that UFC platforms hate speech, insisting he supports free speech. A year ago his tense exchange with a reporter who questioned why he allowed fighters to make anti-LGBT remarks went viral.
A Survey on Federated Learning in Human Sensing
Li, Mohan, Gjoreski, Martin, Barbiero, Pietro, Slapničar, Gašper, Luštrek, Mitja, Lane, Nicholas D., Langheinrich, Marc
Human Sensing, a field that leverages technology to monitor human activities, psycho-physiological states, and interactions with the environment, enhances our understanding of human behavior and drives the development of advanced services that improve overall quality of life. However, its reliance on detailed and often privacy-sensitive data as the basis for its machine learning (ML) models raises significant legal and ethical concerns. The recently proposed ML approach of Federated Learning (FL) promises to alleviate many of these concerns, as it is able to create accurate ML models without sending raw user data to a central server. While FL has demonstrated its usefulness across a variety of areas, such as text prediction and cyber security, its benefits in Human Sensing are under-explored, given the particular challenges in this domain. This survey conducts a comprehensive analysis of the current state-of-the-art studies on FL in Human Sensing, and proposes a taxonomy and an eight-dimensional assessment for FL approaches. Through the eight-dimensional assessment, we then evaluate whether the surveyed studies consider a specific FL-in-Human-Sensing challenge or not. Finally, based on the overall analysis, we discuss open challenges and highlight five research aspects related to FL in Human Sensing that require urgent research attention. Our work provides a comprehensive corpus of FL studies and aims to assist FL practitioners in developing and evaluating solutions that effectively address the real-world complexities of Human Sensing.
TOAST Framework: A Multidimensional Approach to Ethical and Sustainable AI Integration in Organizations
Artificial Intelligence (AI) has emerged as a transformative technology with the potential to revolutionize various sectors, from healthcare to finance, education, and beyond. However, successfully implementing AI systems remains a complex challenge, requiring a comprehensive and methodologically sound framework. This paper contributes to this challenge by introducing the Trustworthy, Optimized, Adaptable, and Socio-Technologically harmonious (TOAST) framework. It draws on insights from various disciplines to align technical strategy with ethical values, societal responsibilities, and innovation aspirations. The TOAST framework is a novel approach designed to guide the implementation of AI systems, focusing on reliability, accountability, technical advancement, adaptability, and socio-technical harmony. By grounding the TOAST framework in healthcare case studies, this paper provides a robust evaluation of its practicality and theoretical soundness in addressing operational, ethical, and regulatory challenges in high-stakes environments, demonstrating how adaptable AI systems can enhance institutional efficiency, mitigate risks like bias and data privacy, and offer a replicable model for other sectors requiring ethically aligned and efficient AI integration.
Retrieval-Augmented Generation by Evidence Retroactivity in LLMs
Xiao, Liang, Dai, Wen, Chen, Shuai, Qin, Bin, Shi, Chongyang, Jing, Haopeng, Guo, Tianyu
Retrieval-augmented generation has gained significant attention due to its ability to integrate relevant external knowledge, enhancing the accuracy and reliability of the LLMs' responses. Most of the existing methods apply a dynamic multiple retrieval-generating process, to address multi-hop complex questions by decomposing them into sub-problems. However, these methods rely on an unidirectional forward reasoning paradigm, where errors from insufficient reasoning steps or inherent flaws in current retrieval systems are irreversible, potentially derailing the entire reasoning chain. For the first time, this work introduces Retroactive Retrieval-Augmented Generation (RetroRAG), a novel framework to build a retroactive reasoning paradigm. RetroRAG revises and updates the evidence, redirecting the reasoning chain to the correct direction. RetroRAG constructs an evidence-collation-discovery framework to search, generate, and refine credible evidence. It synthesizes inferential evidence related to the key entities in the question from the existing source knowledge and formulates search queries to uncover additional information. As new evidence is found, RetroRAG continually updates and organizes this information, enhancing its ability to locate further necessary evidence. Paired with an Answerer to generate and evaluate outputs, RetroRAG is capable of refining its reasoning process iteratively until a reliable answer is obtained. Empirical evaluations show that RetroRAG significantly outperforms existing methods.