Would I Lie To You? Inference Time Alignment of Language Models using Direct Preference Heads
–Neural Information Processing Systems
Pre-trained Language Models (LMs) exhibit strong zero-shot and in-context learning capabilities; however, their behaviors are often difficult to control.
Neural Information Processing Systems
Feb-17-2026, 09:49:14 GMT