Goto

Collaborating Authors

 defense system


Retrieval-Augmented Defense: Adaptive and Controllable Jailbreak Prevention for Large Language Models

Yang, Guangyu, Chen, Jinghong, Mei, Jingbiao, Lin, Weizhe, Byrne, Bill

arXiv.org Artificial Intelligence

Large Language Models (LLMs) remain vulnerable to jailbreak attacks, which attempt to elicit harmful responses from LLMs. The evolving nature and diversity of these attacks pose many challenges for defense systems, including (1) adaptation to counter emerging attack strategies without costly retraining, and (2) control of the trade-off between safety and utility. To address these challenges, we propose Retrieval-Augmented Defense (RAD), a novel framework for jailbreak detection that incorporates a database of known attack examples into Retrieval-Augmented Generation, which is used to infer the underlying, malicious user query and jailbreak strategy used to attack the system. RAD enables training-free updates for newly discovered jailbreak strategies and provides a mechanism to balance safety and utility. Experiments on StrongREJECT show that RAD substantially reduces the effectiveness of strong jailbreak attacks such as PAP and PAIR while maintaining low rejection rates for benign queries. We propose a novel evaluation scheme and show that RAD achieves a robust safety-utility trade-off across a range of operating points in a controllable manner.


Here's What to Know About Poland Shooting Down Russian Drones

WIRED

Here's What to Know About Poland Shooting Down Russian Drones On Wednesday morning, Poland shot down several Russian drones that entered its airspace--a first since Moscow's invasion of Ukraine. The incident disrupted air travel and set the region on edge. Airports closed in Poland after the country's military detected Russian drones in its airspace. Early Wednesday morning, Poland shot down several Russian drones that had violated its airspace during a massive strike against western Ukraine . The Polish military operation, confirmed by Prime Minister Donald Tusk through a social media message in the early morning hours, marks a turning point in Warsaw's involvement in the conflict that has affected the region for more than two and a half years.


Lockheed Martin CEO shares path to making Trump's 'Golden Dome' missile shield a reality

FOX News

Lockheed Martin CEO Jim Taiclet weighs in on the Trump administration's Golden Dome defense system announcement on'Special Report.' Lockheed Martin CEO Jim Taiclet said President Donald Trump's proposed "Golden Dome" missile shield for the United States is a "fantastic vision" for the country as defense contracting companies work to implement the commander-in-chief's bold idea by the end of his term. "We'll be able to use the Golden Dome concept to make sure the country is increasingly protected against hypersonic threats," Taiclet said in an exclusive interview Tuesday on "Special Report." Trump unveiled his ambitious missile defense plan at the White House last week, which he says will be operational by the time he leaves office. The announcement comes as the United States faces growing threats from adversaries around the world who are making significant inroads in artificial intelligence and drone technology.


America's Golden Dome can't wait

FOX News

In response to an executive order, President Donald Trump's team will present him with a plan for creating the Golden Dome, a missile defense shield meant to guard against attacks that are increasingly difficult to defeat. This effort will demand innovative thinking, collective will and rapid action. Since my tenure as director of the Missile Defense Agency in the early 2000s, an integrated network of sensors based in space, land and sea paired with ground-based interceptors has effectively deterred rudimentary missile attacks on our homeland from Iran, North Korea and others. But as they continue to improve their capabilities and as we look at a resurgent Russia and aggressive China, we need to build our next-generation missile defense. The window to defeat ballistic missiles heading to targets in the US is less than 40 minutes and can be as brief as 10 or 15 minutes if launched from a submarine closer to its target.


RESTRAIN: Reinforcement Learning-Based Secure Framework for Trigger-Action IoT Environment

Alam, Md Morshed, Das, Lokesh Chandra, Roy, Sandip, Shetty, Sachin, Wang, Weichao

arXiv.org Artificial Intelligence

Internet of Things (IoT) platforms with trigger-action capability allow event conditions to trigger actions in IoT devices autonomously by creating a chain of interactions. Adversaries exploit this chain of interactions to maliciously inject fake event conditions into IoT hubs, triggering unauthorized actions on target IoT devices to implement remote injection attacks. Existing defense mechanisms focus mainly on the verification of event transactions using physical event fingerprints to enforce the security policies to block unsafe event transactions. These approaches are designed to provide offline defense against injection attacks. The state-of-the-art online defense mechanisms offer real-time defense, but extensive reliability on the inference of attack impacts on the IoT network limits the generalization capability of these approaches. In this paper, we propose a platform-independent multi-agent online defense system, namely RESTRAIN, to counter remote injection attacks at runtime. RESTRAIN allows the defense agent to profile attack actions at runtime and leverages reinforcement learning to optimize a defense policy that complies with the security requirements of the IoT network. The experimental results show that the defense agent effectively takes real-time defense actions against complex and dynamic remote injection attacks and maximizes the security gain with minimal computational overhead.


FedCLEAN: byzantine defense by CLustering Errors of Activation maps in Non-IID federated learning environments

Ghali, Mehdi Ben, Bellafqira, Reda, Coatrieux, Gouenou

arXiv.org Artificial Intelligence

Federated Learning (FL) enables clients to collaboratively train a global model using their local datasets while reinforcing data privacy. However, FL is susceptible to poisoning attacks. Existing defense mechanisms assume that clients' data are independent and identically distributed (IID), making them ineffective in real-world applications where data are non-IID. This paper presents FedCLEAN, the first defense capable of filtering attackers' model updates in a non-IID FL environment. The originality of FedCLEAN is twofold. First, it relies on a client confidence score derived from the reconstruction errors of each client's model activation maps for a given trigger set, with reconstruction errors obtained by means of a Conditional Variational Autoencoder trained according to a novel server-side strategy. Second, we propose an ad-hoc trust propagation algorithm based on client scores, which allows building a cluster of benign clients while flagging potential attackers. Experimental results on the datasets MNIST and FashionMNIST demonstrate the robustness of FedCLEAN against Byzantine attackers in non-IID scenarios and a close-to-zero benign client misclassification rate, even in the absence of an attack.


SPIN: Self-Supervised Prompt INjection

Zhou, Leon, Yang, Junfeng, Mao, Chengzhi

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are increasingly used in a variety of important applications, yet their safety and reliability remain as major concerns. Various adversarial and jailbreak attacks have been proposed to bypass the safety alignment and cause the model to produce harmful responses. We introduce Self-supervised Prompt INjection (SPIN) which can detect and reverse these various attacks on LLMs. As our self-supervised prompt defense is done at inference-time, it is also compatible with existing alignment and adds an additional layer of safety for defense. Our benchmarks demonstrate that our system can reduce the attack success rate by up to 87.9%, while maintaining the performance on benign user requests. In addition, we discuss the situation of an adaptive attacker and show that our method is still resilient against attackers who are aware of our defense.


Houthis Claim Responsibility for Deadly Tel Aviv Explosion

NYT > Middle East

The Iran-backed Houthi militia claimed responsibility for a rare drone attack in central Tel Aviv that crashed into a building near the United States Embassy branch office early Friday, killing at least one person and wounding eight others. Rear Adm. Daniel Hagari, the Israeli military spokesman, told reporters that Israel's defense systems had apparently picked up the drone but failed to register it as a threat. No air-raid sirens were activated to warn civilians of the attack, despite Israel's extensive aerial defense system. "We are investigating why we did not identify it, attack it and intercept it," Admiral Hagari said. The Israeli military said the drone had likely flown from Yemen, where the Houthis are based, before approaching Tel Aviv from the coast.


Israel's advanced military technology on full display during Iran's attack

FOX News

Israel Defense Forces spokesperson Rear Adm. Daniel Hagari discusses Iran's attack on Israel, saying the attacks proved that Iran seeks to "escalate the region." JERUSALEM -- Some of Israel's most advanced military technology was on display over the weekend when its multi-level aerial defense array led the way in striking down an estimated 99% of the more than 350 drones, rockets and missiles that were fired by Iran in an unprecedented attack on the Jewish state. From the Iron Dome, which in its latest format uses artificial intelligence (AI) to improve accuracy when shooting short-range surface-to-surface rockets, to David's Sling, which intercepts short- to medium-range and medium- to long-range surface-to-surface missiles, to the Arrow 2 and 3 systems, which is used for longer-range ballistic and cruise missiles, as well as AI-driven aircraft and other technology, Israel's defensive operation proved it was far superior to the offensive capabilities of the Islamic Republic. In a press briefing following the attack, Israel Defense Forces spokesperson Rear Adm. Daniel Hagari hailed Israel's defensive operation, which was carried out together with partners from U.S. Central Command (CENTCOM), as a "very significant strategic achievement." He said it demonstrated the "exceptional professionalism" of Israel's Aerial Defense Array and the "defensive abilities of the air force as well as the army's military and technological superiority."


Ukraine official points to Israel's response to Iranian attack as blueprint for Kyiv's defense needs

FOX News

Video captures the moment and aftermath of what appears to be a drone, allegedly of Ukrainian origin, striking Russian drone production facility. Russian officials claimed that only a worker's dormitory was hit. The success of Israel and its allies in largely thwarting a massive Iranian missile and drone attack shows what Ukraine could achieve against Russian aerial barrages if it had more support from its partners, Ukrainian Foreign Minister Dmytro Kuleba said Monday. A recent Russian aerial campaign targeting Ukraine's energy infrastructure and other targets has wrought extensive damage, and Ukrainian officials have pleaded with the country's Western allies to provide more air defense systems as the war stretches into its third year. Israel's defense system, with assistance from the U.S. and Britain -- countries that are also supporting Ukraine's war effort -- is credited with preventing serious damage or casualties in Sunday's attack by Iran using more than 300 drones and missiles. Kuleba, speaking to reporters in Kyiv, urged Ukraine's allies to "give us what we need and we will do the rest of the job."