Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data

Open in new window