Reinforcement Learning Upside Down: Don't Predict Rewards -- Just Map Them to Actions

Open in new window