natural feature
FFCBA: Feature-based Full-target Clean-label Backdoor Attacks
Yin, Yangxu, Chen, Honglong, Gao, Yudong, Sun, Peng, Wu, Liantao, Li, Zhe, Liu, Weifeng
Backdoor attacks pose a significant threat to deep neural networks, as backdoored models would misclassify poisoned samples with specific triggers into target classes while maintaining normal performance on clean samples. Among these, multi-target backdoor attacks can simultaneously target multiple classes. However, existing multi-target backdoor attacks all follow the dirty-label paradigm, where poisoned samples are mislabeled, and most of them require an extremely high poisoning rate. This makes them easily detectable by manual inspection. In contrast, clean-label attacks are more stealthy, as they avoid modifying the labels of poisoned samples. However, they generally struggle to achieve stable and satisfactory attack performance and often fail to scale effectively to multi-target attacks. To address this issue, we propose the Feature-based Full-target Clean-label Backdoor Attacks (FFCBA) which consists of two paradigms: Feature-Spanning Backdoor Attacks (FSBA) and Feature-Migrating Backdoor Attacks (FMBA). FSBA leverages class-conditional autoencoders to generate noise triggers that align perturbed in-class samples with the original category's features, ensuring the effectiveness, intra-class consistency, inter-class specificity and natural-feature correlation of triggers. While FSBA supports swift and efficient attacks, its cross-model attack capability is relatively weak. FMBA employs a two-stage class-conditional autoencoder training process that alternates between using out-of-class samples and in-class samples. This allows FMBA to generate triggers with strong target-class features, making it highly effective for cross-model attacks. We conduct experiments on multiple datasets and models, the results show that FFCBA achieves outstanding attack performance and maintains desirable robustness against the state-of-the-art backdoor defenses.
Leveraging Hierarchical Taxonomies in Prompt-based Continual Learning
Tran, Quyen, Phan, Hoang, Le, Minh, Truong, Tuan, Phung, Dinh, Ngo, Linh, Nguyen, Thien, Ho, Nhat, Le, Trung
Drawing inspiration from human learning behaviors, this work proposes a novel approach to mitigate catastrophic forgetting in Prompt-based Continual Learning models by exploiting the relationships between continuously emerging class data. We find that applying human habits of organizing and connecting information can serve as an efficient strategy when training deep learning models. Specifically, by building a hierarchical tree structure based on the expanding set of labels, we gain fresh insights into the data, identifying groups of similar classes could easily cause confusion. Additionally, we delve deeper into the hidden connections between classes by exploring the original pretrained model's behavior through an optimal transport-based approach. From these insights, we propose a novel regularization loss function that encourages models to focus more on challenging knowledge areas, thereby enhancing overall performance. Experimentally, our method demonstrated significant superiority over the most robust state-of-the-art models on various benchmarks.
The SaTML '24 CNN Interpretability Competition: New Innovations for Concept-Level Interpretability
Casper, Stephen, Yun, Jieun, Baek, Joonhyuk, Jung, Yeseong, Kim, Minhwan, Kwon, Kiwan, Park, Saerom, Moore, Hayden, Shriver, David, Connor, Marissa, Grimes, Keltin, Nicolson, Angus, Tagade, Arush, Rumbelow, Jessica, Nguyen, Hieu Minh, Hadfield-Menell, Dylan
Interpretability techniques are valuable for helping humans understand and oversee AI systems. The SaTML 2024 CNN Interpretability Competition solicited novel methods for studying convolutional neural networks (CNNs) at the ImageNet scale. The objective of the competition was to help human crowd-workers identify trojans in CNNs. This report showcases the methods and results of four featured competition entries. It remains challenging to help humans reliably diagnose trojans via interpretability tools. However, the competition's entries have contributed new techniques and set a new record on the benchmark from Casper et al., 2023.
Red Teaming Deep Neural Networks with Feature Synthesis Tools
Casper, Stephen, Li, Yuxiao, Li, Jiawei, Bu, Tong, Zhang, Kevin, Hariharan, Kaivalya, Hadfield-Menell, Dylan
Interpretable AI tools are often motivated by the goal of understanding model behavior in out-of-distribution (OOD) contexts. Despite the attention this area of study receives, there are comparatively few cases where these tools have identified previously unknown bugs in models. We argue that this is due, in part, to a common feature of many interpretability methods: they analyze model behavior by using a particular dataset. This only allows for the study of the model in the context of features that the user can sample in advance. To address this, a growing body of research involves interpreting models using \emph{feature synthesis} methods that do not depend on a dataset. In this paper, we benchmark the usefulness of interpretability tools on debugging tasks. Our key insight is that we can implant human-interpretable trojans into models and then evaluate these tools based on whether they can help humans discover them. This is analogous to finding OOD bugs, except the ground truth is known, allowing us to know when an interpretation is correct. We make four contributions. (1) We propose trojan discovery as an evaluation task for interpretability tools and introduce a benchmark with 12 trojans of 3 different types. (2) We demonstrate the difficulty of this benchmark with a preliminary evaluation of 16 state-of-the-art feature attribution/saliency tools. Even under ideal conditions, given direct access to data with the trojan trigger, these methods still often fail to identify bugs. (3) We evaluate 7 feature-synthesis methods on our benchmark. (4) We introduce and evaluate 2 new variants of the best-performing method from the previous evaluation. A website for this paper and its code is at https://benchmarking-interpretability.csail.mit.edu/
Backdoor Scanning for Deep Neural Networks through K-Arm Optimization
Shen, Guangyu, Liu, Yingqi, Tao, Guanhong, An, Shengwei, Xu, Qiuling, Cheng, Siyuan, Ma, Shiqing, Zhang, Xiangyu
Back-door attack poses a severe threat to deep learning systems. It injects hidden malicious behaviors to a model such that any input stamped with a special pattern can trigger such behaviors. Detecting back-door is hence of pressing need. Many existing defense techniques use optimization to generate the smallest input pattern that forces the model to misclassify a set of benign inputs injected with the pattern to a target label. However, the complexity is quadratic to the number of class labels such that they can hardly handle models with many classes. Inspired by Multi-Arm Bandit in Reinforcement Learning, we propose a K-Arm optimization method for backdoor detection. By iteratively and stochastically selecting the most promising labels for optimization with the guidance of an objective function, we substantially reduce the complexity, allowing to handle models with many classes. Moreover, by iteratively refining the selection of labels to optimize, it substantially mitigates the uncertainty in choosing the right labels, improving detection accuracy. At the time of submission, the evaluation of our method on over 4000 models in the IARPA TrojAI competition from round 1 to the latest round 4 achieves top performance on the leaderboard. Our technique also supersedes three state-of-the-art techniques in terms of accuracy and the scanning time needed.
Human Assisted Artificial Intelligence Based Technique to Create Natural Features for OpenStreetMap
Yadav, Piyush, Sarkar, Dipto, Deshpande, Shailesh, Curry, Edward
In this work, we propose an AI-based technique using freely available satellite images like Landsat and Sentinel to create natural features over OSM in congruence with human editors acting as initiators and validators. The method is based on Interactive Machine Learning technique where human inputs are coupled with the machine to solve complex problems efficiently as compare to pure autonomous process. We use a bottom-up approach where a machine learning (ML) pipeline in loop with editors is used to extract classes using spectral signatures of images and later convert them to editable features to create natural features.
"Natural" Rights
Is it possible that lakes and forests might have rights before robots? Voters in Toledo have granted "irrevocable rights for the Lake Erie Ecosystem to exist, flourish and naturally evolve" which, according to this story, would give it legal standing to file lawsuits to protect itself from polluters (through the mouthpiece of a human guardian). It's an amazingly bold statement that is rife with thorny questions. Humans have had say over nature ever since Adam and Eve, and most political and cultural uses or abuses have been based on the shifting perspectives of their progeny. Nature is something "out there" that only gains meaning or purpose when defined by us.