What Fundamental Structure in Reward Functions Enables Efficient Sparse-Reward Learning?

Shihab, Ibne Farabi, Akter, Sanjeda, Sharma, Anuj

Sep-10-2025–arXiv.org Artificial Intelligence

Sparse-reward reinforcement learning (RL) remains fundamentally hard: without structure, any agent needs Ω(|S||A|/p) samples to recover rewards. We introduce Policy-A ware Matrix Completion (P AMC) as a first concrete step toward a structural reward learning framework. Our key idea is to exploit approximate low-rank + sparse structure in the reward matrix, under policy-biased (MNAR) sampling. We prove recovery guarantees with inverse-propensity weighting, and establish a visitation-weighted error-to-regret bound linking completion error to control performance. Importantly, when assumptions weaken, P AMC degrades gracefully: confidence intervals widen and the algorithm abstains, ensuring safe fallback to exploration. Empirically, P AMC improves sample efficiency across Atari-26 (10M steps), DM Control, MetaWorld MT50, D4RL offline RL, and preference-based RL benchmarks, outperforming DrQ-v2, DreamerV3, Agent57, T -REX/D-REX, and PrefPPO under compute-normalized comparisons. Our results highlight P AMC as a practical and principled tool when structural rewards exist, and as a concrete first instantiation of a broader structural reward learning perspective. What fundamental properties of reward functions determine the sample complexity of reinforcement learning?

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

Sep-10-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > Iowa (0.15)

Genre:
- Research Report > New Finding (0.66)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (0.89)
  - Neural Networks > Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found