One-Shot Safety Alignment for Large Language Models via Optimal Dualization

Neural Information Processing Systems 

Ideally, we would like methods that train LMs only once ( i.e., one-shot) with a fixed objective, as in

Similar Docs  Excel Report  more

TitleSimilaritySource
None found