attacker
CISA Tells US Agencies to Fix Security Bugs in as Little as 3 Days Thanks to AI Threats
"Defenders cannot afford to take weeks to patch," one Cybersecurity and Infrastructure Security Agency official warned on Wednesday. With new generations of AI models fueling both rapid software vulnerability discovery and the potential for faster exploitation by malicious hackers, the United States Cybersecurity and Infrastructure Security Agency released a new directive on Wednesday that requires more rapid and efficient software patching by federal civilian agencies. The "binding operational directive" (BOD) lays out a rubric for how quickly bugs must be fixed based on four assessments of urgency, with a turnaround time in critical cases of just three days. Chris Butera, CISA's acting executive assistant director for cybersecurity, told reporters on Wednesday that the goal of the directive is to help agencies prioritize, so they can address the most problematic vulnerabilities first while taking more time to remediate bugs that pose a less-pressing risk. The directive comes as private companies and governments have been scrambling to assess the extent of the cybersecurity reckoning that AI vulnerability and exploit development capabilities could unleash.
The Meta hack shows there's more to AI security than Mythos
On June 5, reported that attackers had been using Meta's AI customer support agent to steal Instagram accounts. Their approach was simple: They asked the agent to link the accounts to email addresses that they controlled, and the agent complied. One attacker broke into the dormant Obama White House account and made pro-Iran posts; others took over accounts with valuable, single-word handles, possibly in order to sell them. AI cybersecurity concerns are nothing new. Since Anthropic announced in April that its Mythos model was too good at hacking to be released to the general public, commentators, researchers, and federal officials alike have fixated on the idea that superpowered AI systems could lay waste to our computer infrastructure. That's not quite what this Instagram hack was: There, AI was the target rather than the attacker, and the method was far simpler than anything Mythos would cook up. But as companies offload more work to AI, these comparatively unsophisticated attacks could wreak their own havoc. "As AI becomes more and more widely used--especially when AI is more and more widely used to automate our work flows, like account recovery--I think attackers are going to be more and more motivated to attack AI itself," says Neil Gong, a professor of electrical and computer engineering at Duke University.
Efficient Preference Poisoning Attack on Offline RLHF
Yang, Chenye, Xu, Weiyu, Lai, Lifeng
Offline Reinforcement Learning from Human Feedback (RLHF) pipelines such as Direct Preference Optimization (DPO) train on a pre-collected preference dataset, which makes them vulnerable to preference poisoning attack. We study label flip attacks against log-linear DPO. We first illustrate that flipping one preference label induces a parameter-independent shift in the DPO gradient. Using this key property, we can then convert the targeted poisoning problem into a structured binary sparse approximation problem. To solve this problem, we develop two attack methods: Binary-Aware Lattice Attack (BAL-A) and Binary Matching Pursuit Attack (BMP-A). BAL-A embeds the binary flip selection problem into a binary-aware lattice and applies Lenstra-Lenstra-Lovรกsz reduction and Babai's nearest plane algorithm; we provide sufficient conditions that enforce binary coefficients and recover the minimum-flip objective. BMP-A adapts binary matching pursuit to our non-normalized gradient dictionary and yields coherence-based recovery guarantees and robustness (impossibility) certificates for $K$-flip budgets. Experiments on synthetic dictionaries and the Stanford Human Preferences dataset validate the theory and highlight how dictionary geometry governs attack success.
The AI Era Is Creating a Bug Hunting Arms Race
As attackers ramp up their AI exploit development, the search for software vulnerabilities is changing rapidly. A decade ago, programs to reward researchers for submitting software vulnerability findings were just starting to go mainstream. Vulnerability disclosure and "bug bounty" programs represented a paradigm shift years in the making--moving institutions from hostility and defensiveness about security research findings to acknowledgement that receiving input and releasing fixes was necessary. When Apple finally announced a bug bounty in 2016, the top reward was $200,000. It rose to $1 million in 2019 and $2 million last year .
Dangerous New Linux Exploit Gives Attackers Root Access to Countless Computers
The exploit, dubbed CopyFail and tracked as CVE-2026-31431, allows hackers to take over PCs and data center servers. The Linux vulnerabilities have been patched--but many machines remain at risk. Publicly released exploit code for an effectively unpatched vulnerability that gives root access to virtually all releases of Linux is setting off alarm bells as defenders scramble to ward off severe compromises inside data centers and on personal devices. The vulnerability and exploit code that exploits it were released Wednesday evening by researchers from security firm Theori, five weeks after privately disclosing it to the Linux kernel security team. The critical flaw, tracked as CVE-2026-31431 and the name CopyFail, is a local privilege escalation, a vulnerability class that allows unprivileged users to elevate themselves to administrators.
Setup in Detail
We implement our attack framework using Python 3.7.3 and PyTorch 1.7.13 that supports CUDA 11.0 for accelerating computations by using GPUs. We run our experiments on a machine equipped with Intel i5-8400 2.80GHz 6-core processors, 16 GB of RAM, and four Nvidia GTX 1080 Ti GPUs. To compute the Hessian trace, we use a virtual machine equipped with Intel E5-2686v4 2.30GHz 8-core processors, 64 GB of RAM, and an Nvidia Tesla V100 GPU. For all our attacks in 4.1, 4.2, 4.3, and 4.5, we use symmetric quantization for the weights and asymmetric quantization for the activation--a default configuration in many deep learning frameworks supporting quantization. Quantization granularity is layer-wise for both the weights and activation.
Gaussian Membership Inference Privacy
We propose a novel and practical privacy notion called f-Membership Inference Privacy (f-MIP), which explicitly considers the capabilities of realistic adversaries under the membership inference attack threat model. Consequently, f-MIP offers interpretable privacy guarantees and improved utility (e.g., better classification accuracy). In particular, we derive a parametric family of f-MIP guarantees that we refer to as ยต-Gaussian Membership Inference Privacy (ยต-GMIP) by theoretically analyzing likelihood ratio-based membership inference attacks on stochastic gradient descent (SGD). Our analysis highlights that models trained with standard SGD already offer an elementary level of MIP. Additionally, we show how f-MIP can be amplified by adding noise to gradient updates.
Hidden Poison: Machine Unlearning Enables Camouflaged Poisoning Attacks
We introduce camouflaged data poisoning attacks, a new attack vector that arises in the context of machine unlearning and other settings when model retraining may be induced. An adversary first adds a few carefully crafted points to the training dataset such that the impact on the model's predictions is minimal. The adversary subsequently triggers a request to remove a subset of the introduced points at which point the attack is unleashed and the model's predictions are negatively affected. In particular, we consider clean-label targeted attacks (in which the goal is to cause the model to misclassify a specific test point) on datasets including CIFAR-10, Imagenette, and Imagewoof. This attack is realized by constructing camouflage datapoints that mask the effect of a poisoned dataset. We demonstrate the efficacy of our attack when unlearning is performed via retraining from scratch, the idealized setting of machine unlearning which other efficient methods attempt to emulate, as well as against the approximate unlearning approach of Graves et al. [2021].