Aligning Crowd-sourced Human Feedback for Reinforcement Learning on Code Generation by Large Language Models

Open in new window