REBEL: Reinforcement Learning via Regressing Relative Rewards

Open in new window