AITopics | Fu, Shaopeng

Collaborating Authors

Fu, Shaopeng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

"Short-length" Adversarial Training Helps LLMs Defend "Long-length" Jailbreak Attacks: Theoretical and Empirical Evidence

Fu, Shaopeng, Ding, Liang, Wang, Di

arXiv.org Machine LearningFeb-6-2025

Large language models (LLMs) (Brown et al., 2020; Touvron et al., 2023a; Liu et al., 2024a; Yang et al., 2024a) have been widely integrated into various real-world applications to assist human users, but their safety is found to be vulnerable toward jailbreak attacks (Wei et al., 2023). With carefully crafted adversarial prompts, one can "jailbreak" the safety mechanism of LLMs and induce arbitrary harmful behaviors (Zou et al., 2023; Chao et al., 2023; Liu et al., 2024b). To address this challenge, recent studies (Xhonneux et al., 2024; Mazeika et al., 2024; Yu et al., 2024; Casper et al., 2024) have proposed performing safety alignment through adversarial training (AT) (Madry et al., 2018) to enhance LLMs' robustness against jailbreaking. A standard AT for LLMs would train them on harmful adversarial prompts synthesized by strong jailbreak attacks to learn to refuse these harmful instructions (Mazeika et al., 2024). In such AT, the length of synthesized adversarial prompts used for model training is critical to the final jailbreak robustness of LLMs. Anil et al. (2024) and Xu et al. (2024) have shown that longer adversarial prompts enjoy stronger jailbreaking abilities. Thus, it is reasonable to deduce that performing AT with longer adversarial prompts can help LLMs achieve stronger robustness to defend against "long-length" jailbreak attacks. However, synthesizing long-length adversarial prompts in adversarial training is usually time-consuming since it requires solving discrete optimization problems in high-dimensional spaces. This may limit the application of AT in LLMs' safety alignment and further raises the following research question: How will the adversarial prompt length during AT affect trained LLMs' robustness against jailbreaking with different prompt lengths? S. Fu and D. Wang are with the Division of Computer, Electrical and Mathematical Science and Engineering (CEMSE) at the King Abdullah University of Science and Technology, Thuwal 23955, KSA.

large language model, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

2502.04204

Country: Oceania > Australia (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Materials > Chemicals > Industrial Gases > Liquified Gas (0.50)
Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (0.50)
Energy > Oil & Gas > Midstream (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Theoretical Analysis of Robust Overfitting for Wide DNNs: An NTK Approach

Fu, Shaopeng, Wang, Di

arXiv.org Machine LearningOct-9-2023

Adversarial training (AT) is a canonical method for enhancing the robustness of deep neural networks (DNNs). However, recent studies empirically demonstrated that it suffers from robust overfitting, i.e., a long time AT can be detrimental to the robustness of DNNs. This paper presents a theoretical explanation of robust overfitting for DNNs. Specifically, we non-trivially extend the neural tangent kernel (NTK) theory to AT and prove that an adversarially trained wide DNN can be well approximated by a linearized DNN. Moreover, for squared loss, closed-form AT dynamics for the linearized DNN can be derived, which reveals a new AT degeneration phenomenon: a long-term AT will result in a wide DNN degenerates to that obtained without AT and thus cause robust overfitting. Based on our theoretical results, we further design a method namely Adv-NTK, the first AT algorithm for infinite-width DNNs. Experiments on real-world datasets show that Adv-NTK can help infinite-width DNNs enhance comparable robustness to that of their finite-width counterparts, which in turn justifies our theoretical findings. The code is available at https://github.com/fshp971/adv-ntk.

artificial intelligence, lin, machine learning, (17 more...)

arXiv.org Machine Learning

2310.06112

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Bayesian Inference Forgetting

Fu, Shaopeng, He, Fengxiang, Xu, Yue, Tao, Dacheng

arXiv.org Machine LearningJan-16-2021

The right to be forgotten has been legislated in many countries but the enforcement in machine learning would cause unbearable costs: companies may need to delete whole models trained from massive resources because of single individual requests. Existing works propose to remove the influence of the requested datums on the learned models via its influence function which is no longer naturally well-defined in Bayesian inference. To address this problem, this paper proposes a {\it Bayesian inference forgetting} (BIF) framework to extend the applicable domain to Bayesian inference. In the BIF framework, we develop forgetting algorithms for variational inference and Markov chain Monte Carlo. We show that our algorithms can provably remove the influence of single datums on the learned models. Theoretical analysis demonstrates that our algorithms have guaranteed generalizability. Experiments of Gaussian mixture models on the synthetic dataset and Bayesian neural networks on the Fashion-MNIST dataset verify the feasibility of our methods. The source code package is available at \url{https://github.com/fshp971/BIF}.

artificial intelligence, bayesian inference, neural network, (18 more...)

arXiv.org Machine Learning

2101.06417

Country:

Asia (0.14)
North America > United States (0.14)
Europe (0.14)

Genre: Research Report (0.64)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

Robustness, Privacy, and Generalization of Adversarial Training

He, Fengxiang, Fu, Shaopeng, Wang, Bohan, Tao, Dacheng

arXiv.org Machine LearningDec-25-2020

Adversarial training can considerably robustify deep neural networks to resist adversarial attacks. However, some works suggested that adversarial training might comprise the privacy-preserving and generalization abilities. This paper establishes and quantifies the privacy-robustness trade-off and generalization-robustness trade-off in adversarial training from both theoretical and empirical aspects. We first define a notion, {\it robustified intensity} to measure the robustness of an adversarial training algorithm. This measure can be approximate empirically by an asymptotically consistent empirical estimator, {\it empirical robustified intensity}. Based on the robustified intensity, we prove that (1) adversarial training is $(\varepsilon, \delta)$-differentially private, where the magnitude of the differential privacy has a positive correlation with the robustified intensity; and (2) the generalization error of adversarial training can be upper bounded by an $\mathcal O(\sqrt{\log N}/N)$ on-average bound and an $\mathcal O(1/\sqrt{N})$ high-probability bound, both of which have positive correlations with the robustified intensity. Additionally, our generalization bounds do not explicitly rely on the parameter size which would be prohibitively large in deep learning. Systematic experiments on standard datasets, CIFAR-10 and CIFAR-100, are in full agreement with our theories. The source code package is available at \url{https://github.com/fshp971/RPG}.

deep learning, neural network, robustified intensity, (18 more...)

arXiv.org Machine Learning

2012.13573

Country: Asia > China (0.28)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback