Aligning Frozen LLMs by Reinforcement Learning: An Iterative Reweight-then-Optimize Approach

Open in new window