RLTHF: Targeted Human Feedback for LLM Alignment

Open in new window