REBEL: Reinforcement Learning via Regressing Relative Rewards