Would I Lie To You? Inference Time Alignment of Language Models using Direct Preference Heads

Neural Information Processing Systems 

Pre-trained Language Models (LMs) exhibit strong zero-shot and in-context learning capabilities; however, their behaviors are often difficult to control.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found