Li, Guanlin
Aligning Sentence Simplification with ESL Learner's Proficiency for Language Acquisition
Li, Guanlin, Arase, Yuki, Crespi, Noel
Text simplification is crucial for improving accessibility and comprehension for English as a Second Language (ESL) learners. This study goes a step further and aims to facilitate ESL learners' language acquisition by simplification. Specifically, we propose simplifying complex sentences to appropriate levels for learners while also increasing vocabulary coverage of the target level in the simplifications. We achieve this without a parallel corpus by conducting reinforcement learning on a large language model. Our method employs token-level and sentence-level rewards, and iteratively trains the model on its self-generated outputs to guide the model to search for simplification hypotheses that satisfy the target attributes. Experiment results on CEFR-SP and TurkCorpus datasets show that the proposed method can effectively increase the frequency and diversity of vocabulary of the target level by more than $20\%$ compared to baseline models, while maintaining high simplification quality.
Picky LLMs and Unreliable RMs: An Empirical Study on Safety Alignment after Instruction Tuning
Li, Guanlin, Chen, Kangjie, Guo, Shangwei, Zhang, Jie, Qiu, Han, Zhang, Chao, Wang, Guoyin, Zhang, Tianwei, Li, Jiwei
Large language models (LLMs) have emerged as powerful tools for addressing a wide range of general inquiries and tasks. Despite this, fine-tuning aligned LLMs on smaller, domain-specific datasets, critical to adapting them to specialized tasks, can inadvertently degrade their safety alignment, even when the datasets are benign. This phenomenon makes models more susceptible to providing inappropriate responses. In this study, we systematically examine the factors contributing to safety alignment degradation in benign fine-tuning scenarios. Our analysis identifies three critical factors affecting aligned LLMs: answer structure, identity calibration, and role-play. Additionally, we evaluate the reliability of state-of-the-art reward models (RMs), which are often used to guide alignment processes. Our findings reveal that these RMs frequently fail to accurately reflect human preferences regarding safety, underscoring their limitations in practical applications. By uncovering these challenges, our work highlights the complexities of maintaining safety alignment during fine-tuning and offers guidance to help developers balance utility and safety in LLMs. Datasets and fine-tuning code used in our experiments can be found in https://github.com/GuanlinLee/llm_instruction_tuning.
ART: Automatic Red-teaming for Text-to-Image Models to Protect Benign Users
Li, Guanlin, Chen, Kangjie, Zhang, Shudong, Zhang, Jie, Zhang, Tianwei
Large-scale pre-trained generative models are taking the world by storm, due to their abilities in generating creative content. Meanwhile, safeguards for these generative models are developed, to protect users' rights and safety, most of which are designed for large language models. Existing methods primarily focus on jailbreak and adversarial attacks, which mainly evaluate the model's safety under malicious prompts. Recent work found that manually crafted safe prompts can unintentionally trigger unsafe generations. To further systematically evaluate the safety risks of text-to-image models, we propose a novel Automatic Red-Teaming framework, ART. Our method leverages both vision language model and large language model to establish a connection between unsafe generations and their prompts, thereby more efficiently identifying the model's vulnerabilities. With our comprehensive experiments, we reveal the toxicity of the popular open-source text-to-image models. The experiments also validate the effectiveness, adaptability, and great diversity of ART. Additionally, we introduce three large-scale red-teaming datasets for studying the safety risks associated with text-to-image models. Datasets and models can be found in https://github.com/GuanlinLee/ART.
TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios
Zhang, Xiaokang, Zhang, Jing, Ma, Zeyao, Li, Yang, Zhang, Bohan, Li, Guanlin, Yao, Zijun, Xu, Kangli, Zhou, Jinchang, Zhang-Li, Daniel, Yu, Jifan, Zhao, Shu, Li, Juanzi, Tang, Jie
We introduce TableLLM, a robust large language model (LLM) with 13 billion parameters, purpose-built for proficiently handling tabular data manipulation tasks, whether they are embedded within documents or spreadsheets, catering to real-world office scenarios. We propose a distant supervision method for training, which comprises a reasoning process extension strategy, aiding in training LLMs to understand reasoning patterns more effectively as well as a cross-way validation strategy, ensuring the quality of the automatically generated data. To evaluate the performance of TableLLM, we have crafted a benchmark tailored to address both document and spreadsheet formats as well as constructed a well-organized evaluation pipeline capable of handling both scenarios. Thorough evaluations underscore the advantages of TableLLM when compared to various existing general-purpose and tabular data-focused LLMs.
PRIME: Protect Your Videos From Malicious Editing
Li, Guanlin, Yang, Shuai, Zhang, Jie, Zhang, Tianwei
With the development of generative models, the quality of generated content keeps increasing. Recently, open-source models have made it surprisingly easy to manipulate and edit photos and videos, with just a few simple prompts. While these cutting-edge technologies have gained popularity, they have also given rise to concerns regarding the privacy and portrait rights of individuals. Malicious users can exploit these tools for deceptive or illegal purposes. Although some previous works focus on protecting photos against generative models, we find there are still gaps between protecting videos and images in the aspects of efficiency and effectiveness. Therefore, we introduce our protection method, PRIME, to significantly reduce the time cost and improve the protection performance. Moreover, to evaluate our proposed protection method, we consider both objective metrics and human subjective metrics. Our evaluation results indicate that PRIME only costs 8.3% GPU hours of the cost of the previous state-of-the-art method and achieves better protection results on both human evaluation and objective metrics. Code can be found in https://github.com/GuanlinLee/prime.
Warfare:Breaking the Watermark Protection of AI-Generated Content
Li, Guanlin, Chen, Yifei, Zhang, Jie, Li, Jiwei, Guo, Shangwei, Zhang, Tianwei
AI-Generated Content (AIGC) is gaining great popularity, with many emerging commercial services and applications. These services leverage advanced generative models, such as latent diffusion models and large language models, to generate creative content (e.g., realistic images and fluent sentences) for users. The usage of such generated content needs to be highly regulated, as the service providers need to ensure the users do not violate the usage policies (e.g., abuse for commercialization, generating and distributing unsafe content). A promising solution to achieve this goal is watermarking, which adds unique and imperceptible watermarks on the content for service verification and attribution. Numerous watermarking approaches have been proposed recently. However, in this paper, we show that an adversary can easily break these watermarking mechanisms. Specifically, we consider two possible attacks. (1) Watermark removal: the adversary can easily erase the embedded watermark from the generated content and then use it freely bypassing the regulation of the service provider. (2) Watermark forging: the adversary can create illegal content with forged watermarks from another user, causing the service provider to make wrong attributions. We propose Warfare, a unified methodology to achieve both attacks in a holistic way. The key idea is to leverage a pre-trained diffusion model for content processing and a generative adversarial network for watermark removal or forging. We evaluate Warfare on different datasets and embedding setups. The results prove that it can achieve high success rates while maintaining the quality of the generated content. Compared to existing diffusion model-based attacks, Warfare is 5,050~11,000x faster.
Alleviating the Effect of Data Imbalance on Adversarial Training
Li, Guanlin, Xu, Guowen, Zhang, Tianwei
In this paper, we study adversarial training on datasets that obey the long-tailed distribution, which is practical but rarely explored in previous works. Compared with conventional adversarial training on balanced datasets, this process falls into the dilemma of generating uneven adversarial examples (AEs) and an unbalanced feature embedding space, causing the resulting model to exhibit low robustness and accuracy on tail data. To combat that, we theoretically analyze the lower bound of the robust risk to train a model on a long-tailed dataset to obtain the key challenges in addressing the aforementioned dilemmas. Based on it, we propose a new adversarial training framework -- Re-balancing Adversarial Training (REAT). This framework consists of two components: (1) a new training strategy inspired by the effective number to guide the model to generate more balanced and informative AEs; (2) a carefully constructed penalty function to force a satisfactory feature space. Evaluation results on different datasets and model structures prove that REAT can effectively enhance the model's robustness and preserve the model's clean accuracy. The code can be found in https://github.com/GuanlinLee/REAT.
Rethinking Adversarial Training with Neural Tangent Kernel
Li, Guanlin, Qiu, Han, Guo, Shangwei, Li, Jiwei, Zhang, Tianwei
Adversarial training (AT) is an important and attractive topic in deep learning security, exhibiting mysteries and odd properties. Recent studies of neural network training dynamics based on Neural Tangent Kernel (NTK) make it possible to reacquaint AT and deeply analyze its properties. In this paper, we perform an in-depth investigation of AT process and properties with NTK, such as NTK evolution. We uncover three new findings that are missed in previous works. First, we disclose the impact of data normalization on AT and the importance of unbiased estimators in batch normalization layers. Second, we experimentally explore the kernel dynamics and propose more time-saving AT methods. Third, we study the spectrum feature inside the kernel to address the catastrophic overfitting problem. To the best of our knowledge, it is the first work leveraging the observations of kernel dynamics to improve existing AT methods.
Omnipotent Adversarial Training in the Wild
Li, Guanlin, Chen, Kangjie, Xu, Yuan, Qiu, Han, Zhang, Tianwei
Adversarial training is an important topic in robust deep learning, but the community lacks attention to its practical usage. In this paper, we aim to resolve a real-world challenge, i.e., training a model on an imbalanced and noisy dataset to achieve high clean accuracy and adversarial robustness, with our proposed Omnipotent Adversarial Training (OAT) strategy. OAT consists of two innovative methodologies to address the imperfection in the training set. We first introduce an oracle into the adversarial training process to help the model learn a correct data-label conditional distribution. This carefully-designed oracle can provide correct label annotations for adversarial training. We further propose logits adjustment adversarial training to overcome the data imbalance issue, which can help the model learn a Bayes-optimal distribution. Our comprehensive evaluation results show that OAT outperforms other baselines by more than 20% clean accuracy improvement and 10% robust accuracy improvement under complex combinations of data imbalance and label noise scenarios. The code can be found in https://github.com/GuanlinLee/OAT.
A Benchmark of Long-tailed Instance Segmentation with Noisy Labels
Li, Guanlin, Xu, Guowen, Zhang, Tianwei
In this paper, we consider the instance segmentation task on a long-tailed dataset, which contains label noise, i.e., some of the annotations are incorrect. There are two main reasons making this case realistic. First, datasets collected from real world usually obey a long-tailed distribution. Second, for instance segmentation datasets, as there are many instances in one image and some of them are tiny, it is easier to introduce noise into the annotations. Specifically, we propose a new dataset, which is a large vocabulary long-tailed dataset containing label noise for instance segmentation. Furthermore, we evaluate previous proposed instance segmentation algorithms on this dataset. The results indicate that the noise in the training dataset will hamper the model in learning rare categories and decrease the overall performance, and inspire us to explore more effective approaches to address this practical challenge. The code and dataset are available in https://github.com/GuanlinLee/Noisy-LVIS.