Learning the Optimal Policy for Balancing Short-Term and Long-Term Rewards Qinwei Y ang

Neural Information Processing Systems 

The DPPL method is capable of obtaining optimal policies even when multiple rewards are interrelated.