Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF

Open in new window