Reviews: Mapping State Space using Landmarks for Universal Goal Reaching

Jan-23-2025, 01:47:15 GMT–Neural Information Processing Systems

The paper presents a semi-parameteric model for long-term planning in a general space of problems. It works by training parametric goal-conditioned policies accurate only on local distances (i.e. when current state and goal are within some distance threshold) and leveraging the replay buffer to non-parametrically sample a graph of landmarks which the local goal-conditioned policy can accurately produce paths between. Moving to any goal state is then accomplished by (1) moving to the closest landmark using the goal-conditioned policy, (2) planning a path to the landmark closest to the goal using value-iteration on the (low-dimensional) graph of landmarks, (3) using the goal-conditioned policy to get to the goal state from the closest landmark. The paper essentially tackles the problem that goal-conditioned policies, or Universal Value Function Approximators (UVFA), degrade substantially in performance as the planning horizon increases. By leveraging the replay buffer to provide way-points for the algorithm to plan locally along, accuracy over longer ranges is maintained.

goal-conditioned policy, navigation, replay buffer, (11 more...)

Neural Information Processing Systems

Jan-23-2025, 01:47:15 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (0.51)
  - Machine Learning (0.42)