Rethinking KL Regularization in RLHF: From Value Estimation to Gradient Optimization

Open in new window