AITopics | llmbkd

Collaborating Authors

llmbkd

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Ultimate Cookbook for Invisible Poison: Crafting Subtle Clean-Label Text Backdoors with Style Attributes

You, Wencong, Lowd, Daniel

arXiv.org Artificial IntelligenceApr-25-2025

Backdoor attacks on text classifiers can cause them to predict a predefined label when a particular "trigger" is present. Prior attacks often rely on triggers that are ungrammatical or otherwise unusual, leading to conspicuous attacks. As a result, human annotators, who play a critical role in curating training data in practice, can easily detect and filter out these unnatural texts during manual inspection, reducing the risk of such attacks. We argue that a key criterion for a successful attack is for text with and without triggers to be indistinguishable to humans. However, prior work neither directly nor comprehensively evaluated attack subtlety and invisibility with human involvement. We bridge the gap by conducting thorough human evaluations to assess attack subtlety. We also propose \emph{AttrBkd}, consisting of three recipes for crafting subtle yet effective trigger attributes, such as extracting fine-grained attributes from existing baseline backdoor attacks. Our human evaluations find that AttrBkd with these baseline-derived attributes is often more effective (higher attack success rate) and more subtle (fewer instances detected by humans) than the original baseline backdoor attacks, demonstrating that backdoor attacks can bypass detection by being inconspicuous and appearing natural even upon close inspection, while still remaining effective. Our human annotation also provides information not captured by automated metrics used in prior work, and demonstrates the misalignment of these metrics with human judgment.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2504.173

Country:

Asia (1.00)
Europe (0.92)
North America > United States > Oregon (0.28)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.45)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Large Language Models Are Better Adversaries: Exploring Generative Clean-Label Backdoor Attacks Against Text Classifiers

You, Wencong, Hammoudeh, Zayd, Lowd, Daniel

arXiv.org Artificial IntelligenceOct-28-2023

Backdoor attacks manipulate model predictions by inserting innocuous triggers into training and test data. We focus on more realistic and more challenging clean-label attacks where the adversarial training examples are correctly labeled. Our attack, LLMBkd, leverages language models to automatically insert diverse style-based triggers into texts. We also propose a poison selection technique to improve the effectiveness of both LLMBkd as well as existing textual backdoor attacks. Lastly, we describe REACT, a baseline defense to mitigate backdoor attacks via antidote training examples. Our evaluations demonstrate LLMBkd's effectiveness and efficiency, where we consistently achieve high attack success rates across a wide range of styles with little effort and no model training.

dataset, llmbkd, poison data, (15 more...)

arXiv.org Artificial Intelligence

2310.18603

Country:

North America > United States > Oregon > Lane County > Eugene (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
(8 more...)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback