Self-Supervised Alignment with Mutual Information Learning to Follow Principles without Preference Labels

Neural Information Processing Systems 

Aligning LMs to human preferences can be resource-intensive and technically challenging.