Hindsight-DICE: Stable Credit Assignment for Deep Reinforcement Learning

Velu, Akash, Vaidyanath, Skanda, Arumugam, Dilip

Aug-18-2023–arXiv.org Artificial Intelligence

Reinforcement learning is the classic paradigm for addressing sequential decision-making problems [47]. Naturally, while inheriting the fundamental challenge of generalization across novel states and actions from supervised learning, general-purpose reinforcement-learning agents must also contend with the additional challenges of exploration and credit assignment. While much initial progress in the field was driven largely by innovative machinery for tackling credit assignment [45, 46, 44] alongside simple exploration heuristics (ε-greedy exploration, for example), recent years have seen a reversal with the bulk of attention focused on a broad array of exploration methods (spanning additional heuristics as well as more principled approaches) [51, 38, 15], and relatively little consideration given to issues of credit assignment. This lack of interest in solution concepts, however, has not stopped the proliferation of reinforcement learning into novel application areas characterized by long problem horizons and sparse reward signals; indeed, the current reinforcement learning from human feedback (RLHF) paradigm [28] is now a widely popularized example of an environment that operates in perhaps the harshest setting where a single feedback signal is only obtained after the completion of a long trajectory.

artificial intelligence, machine learning, reinforcement learning, (11 more...)

arXiv.org Artificial Intelligence

Aug-18-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.93)

Genre:
- Research Report (0.50)

Industry:
- Government > Military > Air Force (0.68)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.86)
  - Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found