Goto

Collaborating Authors

 Liu, Wenhe


Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization

arXiv.org Artificial Intelligence

Recent advancements have allowed large language models (LLMs) to be employed across various sectors, such as content generation [15], programming support [13], and healthcare [7]. Nevertheless, LLMs can pose risks by possibly generating malicious content, including writing malware, guidance for making dangerous items, and leaking private information from their training data [18, 10]. As LLMs become more powerful and widely used, it becomes increasingly important to manage the risks associated with their misuse. In this context, the concept of red-teaming LLMs is introduced to test the reliability of their safety features [2, 17]. Consequently, the LLM jailbreak attack was developed to support the red-teaming process: by combining the jailbreak prompt with malicious questions (e.g., how to make explosives), it can mislead the aligned LLMs to circumvent the safety features and potentially produce responses that are harmful, discriminatory, violent, or sensitive. Recently, a number of automatic jailbreak attacks have been introduced. Generally, these can be categorized into two types: prompt-level jailbreaks [8, 11, 3] and token-level jailbreaks [18, 6, 9]. Prompt-level jailbreaks employ semantically meaningful deception to compromise LLMs.


Semi-Supervised Bayesian Attribute Learning for Person Re-Identification

AAAI Conferences

Person re-identification (re-ID) tasks aim to identify the same person in multiple images captured from non-overlapping camera views. Most previous re-ID studies have attempted to solve this problem through either representation learning or metric learning, or by combining both techniques. Representation learning relies on the latent factors or attributes of the data. In most of these works, the dimensionality of the factors/attributes has to be manually determined for each new dataset. Thus, this approach is not robust. Metric learning optimizes a metric across the dataset to measure similarity according to distance. However, choosing the optimal method for computing these distances is data dependent, and learning the appropriate metric relies on a sufficient number of pair-wise labels. To overcome these limitations, we propose a novel algorithm for person re-ID, called semi-supervised Bayesian attribute learning. We introduce an Indian Buffet Process to identify the priors of the latent attributes. The dimensionality of attributes factors is then automatically determined by nonparametric Bayesian learning. Meanwhile, unlike traditional distance metric learning, we propose a re-identification probability distribution to describe how likely it is that a pair of images contains the same person. This technique relies solely on the latent attributes of both images. Moreover, pair-wise labels that are not known can be estimated from pair-wise labels that are known, making this a robust approach for semi-supervised learning. Extensive experiments demonstrate the superior performance of our algorithm over several state-of-the-art algorithms on small-scale datasets and comparable performance on large-scale re-ID datasets.