Going Beyond Heuristics by Imposing Policy Improvement as a Constraint Chi-Chang Lee 1
–Neural Information Processing Systems
In many reinforcement learning (RL) applications, incorporating heuristic rewards alongside the task reward is crucial for achieving desirable performance. Heuristics encode prior human knowledge about how a task should be done, providing valuable hints for RL algorithms. However, such hints may not be optimal, limiting the performance of learned policies. The currently established way of using heuristics is to modify the heuristic reward in a manner that ensures that the optimal policy learned with it remains the same as the optimal policy for the task reward (i.e., optimal policy invariance). However, these methods often fail in practical scenarios with limited training data.
Neural Information Processing Systems
Mar-27-2025, 15:48:44 GMT
- Country:
- North America > United States (0.92)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.92)
- Research Report
- Industry:
- Government > Military (0.45)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Reinforcement Learning (1.00)
- Representation & Reasoning
- Agents (1.00)
- Optimization (1.00)
- Robots (1.00)
- Information Technology > Artificial Intelligence