Reviews: Mapping State Space using Landmarks for Universal Goal Reaching

Neural Information Processing Systems 

The paper presents a semi-parameteric model for long-term planning in a general space of problems. It works by training parametric goal-conditioned policies accurate only on local distances (i.e. when current state and goal are within some distance threshold) and leveraging the replay buffer to non-parametrically sample a graph of landmarks which the local goal-conditioned policy can accurately produce paths between. Moving to any goal state is then accomplished by (1) moving to the closest landmark using the goal-conditioned policy, (2) planning a path to the landmark closest to the goal using value-iteration on the (low-dimensional) graph of landmarks, (3) using the goal-conditioned policy to get to the goal state from the closest landmark. The paper essentially tackles the problem that goal-conditioned policies, or Universal Value Function Approximators (UVFA), degrade substantially in performance as the planning horizon increases. By leveraging the replay buffer to provide way-points for the algorithm to plan locally along, accuracy over longer ranges is maintained.