Goto

Collaborating Authors

 alignment stage




SafeCiM: Investigating Resilience of Hybrid Floating-Point Compute-in-Memory Deep Learning Accelerators

Bhattacharya, Swastik, Das, Sanjay, Menon, Anand, Kundu, Shamik, Raha, Arnab, Basu, Kanad

arXiv.org Artificial Intelligence

Deep Neural Networks (DNNs) continue to grow in complexity with Large Language Models (LLMs) incorporating vast numbers of parameters. Handling these parameters efficiently in traditional accelerators is limited by data-transmission bottlenecks, motivating Compute-in-Memory (CiM) architectures that integrate computation within or near memory to reduce data movement. Recent work has explored CiM designs using Floating-Point (FP) and Integer (INT) operations. FP computations typically deliver higher output quality due to their wider dynamic range and precision, benefiting precision-sensitive Generative AI applications. These include models such as LLMs, thus driving advancements in FP-CiM accelerators. However, the vulnerability of FP-CiM to hardware faults remains underexplored, posing a major reliability concern in mission-critical settings. To address this gap, we systematically analyze hardware fault effects in FP-CiM by introducing bit-flip faults at key computational stages, including digital multipliers, CiM memory cells, and digital adder trees. Experiments with Convolutional Neural Networks (CNNs) such as AlexNet and state-of-the-art LLMs including LLaMA-3.2-1B and Qwen-0.3B-Base reveal how faults at each stage affect inference accuracy. Notably, a single adder fault can reduce LLM accuracy to 0%. Based on these insights, we propose a fault-resilient design, SafeCiM, that mitigates fault impact far better than a naive FP-CiM with a pre-alignment stage. For example, with 4096 MAC units, SafeCiM reduces accuracy degradation by up to 49x for a single adder fault compared to the baseline FP-CiM architecture.


LENS: Learning to Segment Anything with Unified Reinforced Reasoning

Zhu, Lianghui, Ouyang, Bin, Zhang, Yuxuan, Cheng, Tianheng, Hu, Rui, Shen, Haocheng, Ran, Longjin, Chen, Xiaoxin, Yu, Li, Liu, Wenyu, Wang, Xinggang

arXiv.org Artificial Intelligence

Text-prompted image segmentation enables fine-grained visual understanding and is critical for applications such as human-computer interaction and robotics. However, existing supervised fine-tuning methods typically ignore explicit chain-of-thought (CoT) reasoning at test time, which limits their ability to generalize to unseen prompts and domains. To address this issue, we introduce LENS, a scalable reinforcement-learning framework that jointly optimizes the reasoning process and segmentation in an end-to-end manner. We propose unified reinforcement-learning rewards that span sentence-, box-, and segment-level cues, encouraging the model to generate informative CoT rationales while refining mask quality. Using a publicly available 3-billion-parameter vision-language model, i.e., Qwen2.5-VL-3B-Instruct, LENS achieves an average cIoU of 81.2% on the RefCOCO, RefCOCO+, and RefCOCOg benchmarks, outperforming the strong fine-tuned method, i.e., GLaMM, by up to 5.6%. These results demonstrate that RL-driven CoT reasoning significantly enhances text-prompted segmentation and offers a practical path toward more generalizable Segment Anything models (SAM). Code is available at https://github.com/hustvl/LENS.


Pharmacist: Safety Alignment Data Curation for Large Language Models against Harmful Fine-tuning

Liu, Guozhi, Mu, Qi, Huang, Tiansheng, Wang, Xinhua, Shen, Li, Lin, Weiwei, Li, Zhang

arXiv.org Artificial Intelligence

Harmful fine-tuning issues present significant safety challenges for fine-tuning-as-a-service in large language models. Existing alignment-stage defenses, e.g., Vaccine, Repnoise, Booster, and T-Vaccine, mitigate harmful fine-tuning issues by enhancing the model's robustness during the alignment phase. While these methods have been proposed to mitigate the issue, they often overlook a critical upstream factor: the role of the original safety-alignment data. We observe that their defense performance and computational efficiency remain constrained by the quality and composition of the alignment dataset. To address this limitation, we propose Pharmacist, a safety alignment data curation solution that enhances defense against harmful fine-tuning by selecting a high-quality and safety-critical core subset from the original alignment data. The core idea of Pharmacist is to train an alignment data selector to rank alignment data. Specifically, up-ranking high-quality and safety-critical alignment data, down-ranking low-quality and non-safety-critical data. Empirical results indicate that models trained on datasets selected by Pharmacist outperform those trained on datasets selected by existing selection methods in both defense and inference performance. In addition, Pharmacist can be effectively integrated with mainstream alignment-stage defense methods. For example, when applied to RepNoise and T-Vaccine, using the dataset selected by Pharmacist instead of the full dataset leads to improvements in defense performance by 2.60\% and 3.30\%, respectively, and enhances inference performance by 3.50\% and 1.10\%. Notably, it reduces training time by 56.83\% and 57.63\%, respectively. Our code is available at https://github.com/Lslland/Pharmacist.




Improving Zero-Shot Object-Level Change Detection by Incorporating Visual Correspondence

Nguyen, Hung Huy, Rahmanzadehgervi, Pooyan, Mai, Long, Nguyen, Anh Totti

arXiv.org Artificial Intelligence

Detecting object-level changes between two images across possibly different views is a core task in many applications that involve visual inspection or camera surveillance. Existing change-detection approaches suffer from three major limitations: (1) lack of evaluation on image pairs that contain no changes, leading to unreported false positive rates; (2) lack of correspondences (i.e., localizing the regions before and after a change); and (3) poor zero-shot generalization across different domains. To address these issues, we introduce a novel method that leverages change correspondences (a) during training to improve change detection accuracy, and (b) at test time, to minimize false positives. That is, we harness the supervision labels of where an object is added or removed to supervise change detectors, improving their accuracy over previous work by a large margin. Our work is also the first to predict correspondences between pairs of detected changes using estimated homography and the Hungarian algorithm. Our model demonstrates superior performance over existing methods, achieving state-of-the-art results in change detection and change correspondence accuracy across both in-distribution and zero-shot benchmarks.


The Root Shapes the Fruit: On the Persistence of Gender-Exclusive Harms in Aligned Language Models

Ovalle, Anaelia, Pavasovic, Krunoslav Lehman, Martin, Louis, Zettlemoyer, Luke, Smith, Eric Michael, Williams, Adina, Sagun, Levent

arXiv.org Artificial Intelligence

Content Warning: This paper contains examples of offensive transphobic content. Natural-language assistants are designed to provide users with helpful responses while avoiding harmful outputs, largely achieved through alignment to human preferences. Yet there is limited understanding of whether alignment techniques may inadvertently perpetuate or even amplify harmful biases inherited from their pre-aligned base models. This issue is compounded by the choice of bias evaluation benchmarks in popular preference-finetuned models, which predominantly focus on dominant social categories, such as binary gender, thereby limiting insights into biases affecting underrepresented groups. Towards addressing this gap, we center transgender, nonbinary, and other gender-diverse identities to investigate how alignment procedures interact with pre-existing gender-diverse bias in LLMs. Our key contributions include: 1) a comprehensive survey of bias evaluation modalities across leading preference-finetuned LLMs, highlighting critical gaps in genderdiverse representation, 2) systematic evaluation of gender-diverse biases across 12 models spanning Direct Preference Optimization (DPO) stages, uncovering harms popular bias benchmarks fail to detect, and 3) a flexible framework for measuring harmful biases in implicit reward signals applicable to other social contexts. Our findings reveal that DPO-aligned models are particularly sensitive to supervised finetuning (SFT), and can amplify two forms of real-world gender-diverse harms from their base models: stigmatization and gender non-affirmative language. We conclude with recommendations tailored to DPO and broader alignment practices, advocating for the adoption of community-informed bias evaluation frameworks to more effectively identify and address underrepresented harms in LLMs.


Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturbation

Huang, Tiansheng, Hu, Sihao, Ilhan, Fatih, Tekin, Selim Furkan, Liu, Ling

arXiv.org Artificial Intelligence

Harmful fine-tuning issue \citep{qi2023fine} poses serious safety concerns for Large language models' fine-tuning-as-a-service. While existing defenses \citep{huang2024vaccine,rosati2024representation} have been proposed to mitigate the issue, their performances are still far away from satisfactory, and the root cause of the problem has not been fully recovered. For the first time in the literature, we in this paper show that \textit{harmful perturbation} over the model weights should be the root cause of alignment-broken of harmful fine-tuning. In order to attenuate the negative impact of harmful perturbation, we propose an alignment-stage solution, dubbed Booster. Technically, along with the original alignment loss, we append a loss regularizer in the alignment stage's optimization. The regularizer ensures that the model's harmful loss reduction before/after simulated harmful perturbation is attenuated, thereby mitigating the subsequent fine-tuning risk. Empirical results show that Booster can effectively reduce the harmful score of the fine-tuned models while maintaining the performance of downstream tasks. Our code is available at \url{https://github.com/git-disl/Booster}.