Explaining and Preventing Alignment Collapse in Iterative RLHF

Open in new window