Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates

Neural Information Processing Systems 

Public LLMs such as the Llama 2-Chat underwent alignment training and were considered safe. Recently Qi et al. [2024] reported that even benign fine-tuning on seemingly safe datasets can give rise to unsafe behaviors in the models.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found