Learning the Optimal Policy for Balancing Short-Term and Long-Term Rewards
–Neural Information Processing Systems
Learning the optimal policy to balance multiple short-term and long-term rewards has extensive applications across various domains. Yet, there is a noticeable scarcity of research addressing policy learning strategies in this context. In this paper, we aim to learn the optimal policy capable of effectively balancing multiple short-term and long-term rewards, especially in scenarios where the long-term outcomes are often missing due to data collection challenges over extended periods. Towards this goal, the conventional linear weighting method, which aggregates multiple rewards into a single surrogate reward through weighted summation, can only achieve suboptimal policies when multiple rewards are related. Motivated by this, we propose a novel decomposition-based policy learning (DPPL) method that converts the whole problem into subproblems.
Neural Information Processing Systems
May-29-2025, 06:45:56 GMT
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Education (0.67)
- Health & Medicine (0.93)
- Technology: