Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF