eigenoption
A Study of Value-Aware Eigenoptions
Kotamreddy, Harshil, Machado, Marlos C.
Options, which impose an inductive bias toward temporal and hierarchical structure, offer a powerful framework for reinforcement learning (RL). While effective in sequential decision-making, they are often handcrafted rather than learned. Among approaches for discovering options, eigenoptions have shown strong performance in exploration, but their role in credit assignment remains underexplored. In this paper, we investigate whether eigenoptions can accelerate credit assignment in model-free RL, evaluating them in tabular and pixel-based gridworlds. We find that pre-specified eigenoptions aid not only exploration but also credit assignment, whereas online discovery can bias the agent's experience too strongly and hinder learning. In the context of deep RL, we also propose a method for learning option-values under non-linear function approximation, highlighting the impact of termination conditions on performance. Our findings reveal both the promise and complexity of using eigenoptions, and options more broadly, to simultaneously support credit assignment and exploration in reinforcement learning.
- North America > Canada > Alberta (0.14)
- Asia > Middle East > Jordan (0.04)
Temporal Abstraction in Reinforcement Learning with the Successor Representation
Machado, Marlos C., Barreto, Andre, Precup, Doina
Reasoning at multiple levels of temporal abstraction is one of the key attributes of intelligence. In reinforcement learning, this is often modeled through temporally extended courses of actions called options. Options allow agents to make predictions and to operate at different levels of abstraction within an environment. Nevertheless, approaches based on the options framework often start with the assumption that a reasonable set of options is known beforehand. When this is not the case, there are no definitive answers for which options one should consider. In this paper, we argue that the successor representation (SR), which encodes states based on the pattern of state visitation that follows them, can be seen as a natural substrate for the discovery and use of temporal abstractions. To support our claim, we take a big picture view of recent results, showing how the SR can be used to discover options that facilitate either temporally-extended exploration or planning. We cast these results as instantiations of a general framework for option discovery in which the agent's representation is used to identify useful options, which are then used to further improve its representation. This results in a virtuous, never-ending, cycle in which both the representation and the options are constantly refined based on each other. Beyond option discovery itself, we discuss how the SR allows us to augment a set of options into a combinatorially large counterpart without additional learning. This is achieved through the combination of previously learned options. Our empirical evaluation focuses on options discovered for temporally-extended exploration and on the use of the SR to combine them. The results of our experiments shed light on design decisions involved in the definition of options and demonstrate the synergy of different methods based on the SR, such as eigenoptions and the option keyboard.
- North America > Canada > Quebec > Montreal (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- (5 more...)
- Health & Medicine (0.46)
- Education (0.46)
Option Discovery in the Absence of Rewards with Manifold Analysis
Bar, Amitay, Talmon, Ronen, Meir, Ron
Options have been shown to be an effective tool in reinforcement learning, facilitating improved exploration and learning. In this paper, we present an approach based on spectral graph theory and derive an algorithm that systematically discovers options without access to a specific reward or task assignment. As opposed to the common practice used in previous methods, our algorithm makes full use of the spectrum of the graph Laplacian. Incorporating modes associated with higher graph frequencies unravels domain subtleties, which are shown to be useful for option discovery. Using geometric and manifold-based analysis, we present a theoretical justification for the algorithm. In addition, we showcase its performance in several domains, demonstrating clear improvements compared to competing methods.
- North America > United States > California > San Diego County > San Diego (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Middle East > Israel (0.04)
Discovering Options for Exploration by Minimizing Cover Time
Jinnai, Yuu, Park, Jee Won, Abel, David, Konidaris, George
Finding a set of edges that minimizes expected One of the main challenges in reinforcement learning cover time is an extremely hard combinatorial optimization is solving tasks with sparse reward. We show problem (Braess, 1968; Braess et al., 2005). Thus, our that the difficulty of discovering a distant rewarding algorithm instead seeks to minimize the upper bound of the state in an MDP is bounded by the expected expected cover time given as a function of the algebraic cover time of a random walk over the graph induced connectivity of the graph Laplacian (Fiedler, 1973; Broder by the MDP's transition dynamics. We & Karlin, 1989; Chung, 1996) using the heuristic method therefore propose to accelerate exploration by constructing by Ghosh & Boyd (2006) that improves the upper bound of options that minimize cover time. The the expected cover time of a uniform random walk.
- Asia > Vietnam > Hanoi > Hanoi (0.05)
- North America > United States > Rhode Island > Providence County > Providence (0.04)