Going Beyond Heuristics by Imposing Policy Improvement as a Constraint Chi-Chang Lee 1

Mar-27-2025, 15:48:44 GMT–Neural Information Processing Systems

In many reinforcement learning (RL) applications, incorporating heuristic rewards alongside the task reward is crucial for achieving desirable performance. Heuristics encode prior human knowledge about how a task should be done, providing valuable hints for RL algorithms. However, such hints may not be optimal, limiting the performance of learned policies. The currently established way of using heuristics is to modify the heuristic reward in a manner that ensures that the optimal policy learned with it remains the same as the optimal policy for the task reward (i.e., optimal policy invariance). However, these methods often fail in practical scenarios with limited training data.

artificial intelligence, machine learning, reinforcement learning, (21 more...)

Neural Information Processing Systems

Mar-27-2025, 15:48:44 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (0.92)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.92)

Industry:
- Government > Military (0.45)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Representation & Reasoning
    - Agents (1.00)
    - Optimization (1.00)
  - Robots (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found