AITopics | Song, Chengyu

Collaborating Authors

Song, Chengyu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unfair Alignment: Examining Safety Alignment Across Vision Encoder Layers in Vision-Language Models

Bachu, Saketh, Shayegani, Erfan, Chakraborty, Trishna, Lal, Rohit, Dutta, Arindam, Song, Chengyu, Dong, Yue, Abu-Ghazaleh, Nael, Roy-Chowdhury, Amit K.

arXiv.org Artificial IntelligenceNov-6-2024

Vision-language models (VLMs) have improved significantly in multi-modal tasks, but their more complex architecture makes their safety alignment more challenging than the alignment of large language models (LLMs). In this paper, we reveal an unfair distribution of safety across the layers of VLM's vision encoder, with earlier and middle layers being disproportionately vulnerable to malicious inputs compared to the more robust final layers. This 'cross-layer' vulnerability stems from the model's inability to generalize its safety training from the default architectural settings used during training to unseen or out-of-distribution scenarios, leaving certain layers exposed. We conduct a comprehensive analysis by projecting activations from various intermediate layers and demonstrate that these layers are more likely to generate harmful outputs when exposed to malicious inputs. Our experiments with LLaVA-1.5 and Llama 3.2 show discrepancies in attack success rates and toxicity scores across layers, indicating that current safety alignment strategies focused on a single default layer are insufficient.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2411.04291

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (0.69)
Law > Criminal Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Cross-Modal Safety Alignment: Is textual unlearning all you need?

Chakraborty, Trishna, Shayegani, Erfan, Cai, Zikui, Abu-Ghazaleh, Nael, Asif, M. Salman, Dong, Yue, Roy-Chowdhury, Amit K., Song, Chengyu

arXiv.org Artificial IntelligenceMay-27-2024

Content warning: This paper contains unsafe model-generated content! Recent studies reveal that integrating new modalities into Large Language Models (LLMs), such as Vision-Language Models (VLMs), creates a new attack surface that bypasses existing safety training techniques like Supervised Fine-tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF). While further SFT and RLHF-based safety training can be conducted in multi-modal settings, collecting multi-modal training datasets poses a significant challenge. Inspired by the structural design of recent multi-modal models, where, regardless of the combination of input modalities, all inputs are ultimately fused into the language space, we aim to explore whether unlearning solely in the textual domain can be effective for cross-modality safety alignment. Our evaluation across six datasets empirically demonstrates the transferability--textual unlearning in VLMs significantly reduces the Attack Success Rate (ASR) to less than 8% and in some cases, even as low as nearly 2% for both text-based and vision-text-based attacks, alongside preserving the utility. Moreover, our experiments show that unlearning with a multi-modal dataset offers no potential benefits but incurs significantly increased computational demands, possibly up to 6 times higher.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2406.02575

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MsPrompt: Multi-step Prompt Learning for Debiasing Few-shot Event Detection

Wang, Siyuan, Zheng, Jianming, Hu, Xuejun, Cai, Fei, Song, Chengyu, Luo, Xueshan

arXiv.org Artificial IntelligenceMay-16-2023

Event detection (ED) is aimed to identify the key trigger words in unstructured text and predict the event types accordingly. Traditional ED models are too data-hungry to accommodate real applications with scarce labeled data. Besides, typical ED models are facing the context-bypassing and disabled generalization issues caused by the trigger bias stemming from ED datasets. Therefore, we focus on the true few-shot paradigm to satisfy the low-resource scenarios. In particular, we propose a multi-step prompt learning model (MsPrompt) for debiasing few-shot event detection, that consists of the following three components: an under-sampling module targeting to construct a novel training set that accommodates the true few-shot setting, a multi-step prompt module equipped with a knowledge-enhanced ontology to leverage the event semantics and latent prior knowledge in the PLMs sufficiently for tackling the context-bypassing problem, and a prototypical module compensating for the weakness of classifying events with sparse data and boost the generalization performance. Experiments on two public datasets ACE-2005 and FewEvent show that MsPrompt can outperform the state-of-the-art models, especially in the strict low-resource scenarios reporting 11.43% improvement in terms of weighted F1-score against the best-performing baseline and achieving an outstanding debiasing performance.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.09335

Country: Asia > China (0.46)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Blackbox Attacks via Surrogate Ensemble Search

Cai, Zikui, Song, Chengyu, Krishnamurthy, Srikanth, Roy-Chowdhury, Amit, Asif, M. Salman

arXiv.org Artificial IntelligenceNov-23-2022

Blackbox adversarial attacks can be categorized into transfer- and query-based attacks. Transfer methods do not require any feedback from the victim model, but provide lower success rates compared to query-based methods. Query attacks often require a large number of queries for success. To achieve the best of both approaches, recent efforts have tried to combine them, but still require hundreds of queries to achieve high success rates (especially for targeted attacks). In this paper, we propose a novel method for Blackbox Attacks via Surrogate Ensemble Search (BASES) that can generate highly successful blackbox attacks using an extremely small number of queries. We first define a perturbation machine that generates a perturbed image by minimizing a weighted loss function over a fixed set of surrogate models. To generate an attack for a given victim model, we search over the weights in the loss function using queries generated by the perturbation machine. Since the dimension of the search space is small (same as the number of surrogate models), the search requires a small number of queries. We demonstrate that our proposed method achieves better success rate with at least 30x fewer queries compared to state-of-the-art methods on different image classifiers trained with ImageNet. In particular, our method requires as few as 3 queries per image (on average) to achieve more than a 90% success rate for targeted attacks and 1-2 queries per image for over a 99% success rate for untargeted attacks. Our method is also effective on Google Cloud Vision API and achieved a 91% untargeted attack success rate with 2.9 queries per image. We also show that the perturbations generated by our proposed method are highly transferable and can be adopted for hard-label blackbox attacks. We also show effectiveness of BASES for hiding attacks on object detectors.

artificial intelligence, machine learning, query, (19 more...)

arXiv.org Artificial Intelligence

2208.0361

Country: North America > United States (0.46)

Genre:

Research Report > New Finding (0.68)
Research Report > Promising Solution (0.68)

Industry:

Information Technology > Security & Privacy (0.68)
Government > Military (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

Adversarial Perturbations Against Real-Time Video Classification Systems

Li, Shasha, Neupane, Ajaya, Paul, Sujoy, Song, Chengyu, Krishnamurthy, Srikanth V., Chowdhury, Amit K. Roy, Swami, Ananthram

arXiv.org Machine LearningJul-2-2018

Recent research has demonstrated the brittleness of machine learning systems to adversarial perturbations. However, the studies have been mostly limited to perturbations on images and more generally, classification that does not deal with temporally varying inputs. In this paper we ask "Are adversarial perturbations possible in real-time video classification systems and if so, what properties must they satisfy?" Such systems find application in surveillance applications, smart vehicles, and smart elderly care and thus, misclassification could be particularly harmful (e.g., a mishap at an elderly care facility may be missed). We show that accounting for temporal structure is key to generating adversarial examples in such systems. We exploit recent advances in generative adversarial network (GAN) architectures to account for temporal correlations and generate adversarial samples that can cause misclassification rates of over 80% for targeted activities. More importantly, the samples also leave other activities largely unaffected making them extremely stealthy. Finally, we also surprisingly find that in many scenarios, the same perturbation can be applied to every frame in a video clip that makes the adversary's ability to achieve misclassification relatively easy.

deep learning, neural network, perturbation, (20 more...)

arXiv.org Machine Learning

1807.00458

Country: North America > United States > California (0.29)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback