Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models

Neural Information Processing Systems 

However, recent studies show that the alignment can be easily compromised through finetuning with only a few adversarially designed training examples.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found