Efficient RLHF: Reducing the Memory Usage of PPO

Open in new window