MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions

Open in new window