Discovering Options for Exploration by Minimizing Cover Time
Jinnai, Yuu, Park, Jee Won, Abel, David, Konidaris, George
–arXiv.org Artificial Intelligence
Finding a set of edges that minimizes expected One of the main challenges in reinforcement learning cover time is an extremely hard combinatorial optimization is solving tasks with sparse reward. We show problem (Braess, 1968; Braess et al., 2005). Thus, our that the difficulty of discovering a distant rewarding algorithm instead seeks to minimize the upper bound of the state in an MDP is bounded by the expected expected cover time given as a function of the algebraic cover time of a random walk over the graph induced connectivity of the graph Laplacian (Fiedler, 1973; Broder by the MDP's transition dynamics. We & Karlin, 1989; Chung, 1996) using the heuristic method therefore propose to accelerate exploration by constructing by Ghosh & Boyd (2006) that improves the upper bound of options that minimize cover time. The the expected cover time of a uniform random walk.
arXiv.org Artificial Intelligence
Mar-16-2019
- Country:
- Asia > Vietnam
- North America > United States
- Rhode Island > Providence County > Providence (0.04)
- Genre:
- Research Report (0.64)
- Technology: