Hindsight-DICE: Stable Credit Assignment for Deep Reinforcement Learning
Velu, Akash, Vaidyanath, Skanda, Arumugam, Dilip
–arXiv.org Artificial Intelligence
Reinforcement learning is the classic paradigm for addressing sequential decision-making problems [47]. Naturally, while inheriting the fundamental challenge of generalization across novel states and actions from supervised learning, general-purpose reinforcement-learning agents must also contend with the additional challenges of exploration and credit assignment. While much initial progress in the field was driven largely by innovative machinery for tackling credit assignment [45, 46, 44] alongside simple exploration heuristics (ε-greedy exploration, for example), recent years have seen a reversal with the bulk of attention focused on a broad array of exploration methods (spanning additional heuristics as well as more principled approaches) [51, 38, 15], and relatively little consideration given to issues of credit assignment. This lack of interest in solution concepts, however, has not stopped the proliferation of reinforcement learning into novel application areas characterized by long problem horizons and sparse reward signals; indeed, the current reinforcement learning from human feedback (RLHF) paradigm [28] is now a widely popularized example of an environment that operates in perhaps the harshest setting where a single feedback signal is only obtained after the completion of a long trajectory.
arXiv.org Artificial Intelligence
Aug-18-2023
- Country:
- North America > United States (0.93)
- Genre:
- Research Report (0.50)
- Industry:
- Government > Military > Air Force (0.68)
- Technology: