accine
- Europe > Bosnia and Herzegovina > Federation of Bosnia and Herzegovina > Sarajevo Canton > Sarajevo (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > South Korea > Gangwon-do > Pyeongchang (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.94)
- Information Technology (0.93)
- Leisure & Entertainment > Sports (0.67)
- Law Enforcement & Public Safety (0.67)
- (2 more...)
Pharmacist: Safety Alignment Data Curation for Large Language Models against Harmful Fine-tuning
Liu, Guozhi, Mu, Qi, Huang, Tiansheng, Wang, Xinhua, Shen, Li, Lin, Weiwei, Li, Zhang
Harmful fine-tuning issues present significant safety challenges for fine-tuning-as-a-service in large language models. Existing alignment-stage defenses, e.g., Vaccine, Repnoise, Booster, and T-Vaccine, mitigate harmful fine-tuning issues by enhancing the model's robustness during the alignment phase. While these methods have been proposed to mitigate the issue, they often overlook a critical upstream factor: the role of the original safety-alignment data. We observe that their defense performance and computational efficiency remain constrained by the quality and composition of the alignment dataset. To address this limitation, we propose Pharmacist, a safety alignment data curation solution that enhances defense against harmful fine-tuning by selecting a high-quality and safety-critical core subset from the original alignment data. The core idea of Pharmacist is to train an alignment data selector to rank alignment data. Specifically, up-ranking high-quality and safety-critical alignment data, down-ranking low-quality and non-safety-critical data. Empirical results indicate that models trained on datasets selected by Pharmacist outperform those trained on datasets selected by existing selection methods in both defense and inference performance. In addition, Pharmacist can be effectively integrated with mainstream alignment-stage defense methods. For example, when applied to RepNoise and T-Vaccine, using the dataset selected by Pharmacist instead of the full dataset leads to improvements in defense performance by 2.60\% and 3.30\%, respectively, and enhances inference performance by 3.50\% and 1.10\%. Notably, it reduces training time by 56.83\% and 57.63\%, respectively. Our code is available at https://github.com/Lslland/Pharmacist.
- Asia > China > Guangdong Province > Guangzhou (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States (0.04)
- Asia > China > Shandong Province (0.04)
- Health & Medicine > Therapeutic Area > Vaccines (0.74)
- Health & Medicine > Therapeutic Area > Immunology (0.64)
- Europe > Bosnia and Herzegovina > Federation of Bosnia and Herzegovina > Sarajevo Canton > Sarajevo (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > South Korea > Gangwon-do > Pyeongchang (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Vaccines (0.50)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Addiction Disorder (0.48)
- Health & Medicine > Therapeutic Area > Immunology (0.40)
Immunogenicity Prediction with Dual Attention Enables Vaccine Target Selection
Li, Song, Tan, Yang, Ke, Song, Hong, Liang, Zhou, Bingxin
Immunogenicity prediction is a central topic in reverse vaccinology for finding candidate vaccines that can trigger protective immune responses. Existing approaches typically rely on highly compressed features and simple model architectures, leading to limited prediction accuracy and poor generalizability. To address these challenges, we introduce ProVaccine, a novel deep learning solution with a dual attention mechanism that integrates pre-trained latent vector representations of protein sequences and structures. We also compile the most comprehensive immunogenicity dataset to date, encompassing over 9,500 antigen sequences, structures, and immunogenicity labels from bacteria, viruses, and tumors. Extensive experiments demonstrate that ProVaccine outperforms existing methods across a wide range of evaluation metrics. Furthermore, we establish a post-hoc validation protocol to assess the practical significance of deep learning models in tackling vaccine design challenges. Our work provides an effective tool for vaccine design and sets valuable benchmarks for future research.
- Health & Medicine > Therapeutic Area > Vaccines (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)