Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
–Neural Information Processing Systems
Public LLMs such as the Llama 2-Chat underwent alignment training and were considered safe. Recently Qi et al. [2024] reported that even benign fine-tuning on seemingly safe datasets can give rise to unsafe behaviors in the models.
Neural Information Processing Systems
Nov-20-2025, 04:52:48 GMT
- Country:
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Genre:
- Research Report
- Experimental Study (0.93)
- New Finding (0.67)
- Research Report
- Industry:
- Government (0.92)
- Information Technology > Security & Privacy (1.00)
- Law (0.67)
- Technology: