Goto

Collaborating Authors

 mapping state space


Reviews: Mapping State Space using Landmarks for Universal Goal Reaching

Neural Information Processing Systems

The paper presents a semi-parameteric model for long-term planning in a general space of problems. It works by training parametric goal-conditioned policies accurate only on local distances (i.e. when current state and goal are within some distance threshold) and leveraging the replay buffer to non-parametrically sample a graph of landmarks which the local goal-conditioned policy can accurately produce paths between. Moving to any goal state is then accomplished by (1) moving to the closest landmark using the goal-conditioned policy, (2) planning a path to the landmark closest to the goal using value-iteration on the (low-dimensional) graph of landmarks, (3) using the goal-conditioned policy to get to the goal state from the closest landmark. The paper essentially tackles the problem that goal-conditioned policies, or Universal Value Function Approximators (UVFA), degrade substantially in performance as the planning horizon increases. By leveraging the replay buffer to provide way-points for the algorithm to plan locally along, accuracy over longer ranges is maintained.


Reviews: Mapping State Space using Landmarks for Universal Goal Reaching

Neural Information Processing Systems

Reviewers liked the approach of combining local navigation using a UVFA trained by HER with global planning based on shortest paths in a graph constructed from a buffer of landmarks. At the same time, reviewers had some concerns regarding clarity of presentation, the similarity of the proposed approach to existing work (namely "Semi-parametric topological memory for navigation" by Savinov et al.), as well as how specific the proposed algorithm is to navigation problems. The rebuttal provided additional experiments on several new domains and included additional discussion of related work, leading one reviewer to raise the score. In the end the ACs found this work to be sufficiently new and promising to warrant acceptance, but we ask the authors to 1) address the concerns regarding related work including Savinov et al. and 2) include a clearer statement of the full approach (an algorithm block would be great) in the camera ready version.


Mapping State Space using Landmarks for Universal Goal Reaching

Neural Information Processing Systems

An agent that has well understood the environment should be able to apply its skills for any given goals, leading to the fundamental problem of learning the Universal Value Function Approximator (UVFA). A UVFA learns to predict the cumulative rewards between all state-goal pairs. However, empirically, the value function for long-range goals is always hard to estimate and may consequently result in failed policy. This has presented challenges to the learning process and the capability of neural networks. We propose a method to address this issue in large MDPs with sparse rewards, in which exploration and routing across remote states are both extremely challenging.


Mapping State Space using Landmarks for Universal Goal Reaching

Neural Information Processing Systems

An agent that has well understood the environment should be able to apply its skills for any given goals, leading to the fundamental problem of learning the Universal Value Function Approximator (UVFA). A UVFA learns to predict the cumulative rewards between all state-goal pairs. However, empirically, the value function for long-range goals is always hard to estimate and may consequently result in failed policy. This has presented challenges to the learning process and the capability of neural networks. We propose a method to address this issue in large MDPs with sparse rewards, in which exploration and routing across remote states are both extremely challenging.