AITopics | sparse reward task

d324a0cc02881779dcda44a675fdcaaa-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-14-2026, 08:19:44 GMT

imitation step, pearl, sample efficiency, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.32)

Add feedback

2b8f621e9244cea5007bac8f5d50e476-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-11-2026, 18:37:41 GMT

expert policy, imitation learning, manuscript, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.56)

Add feedback

Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards

Neural Information Processing SystemsDec-25-2025, 11:45:27 GMT

While using shaped rewards can be beneficial when solving sparse reward tasks, their successful application often requires careful engineering and is problem specific. For instance, in tasks where the agent must achieve some goal state, simple distance-to-goal reward shaping often fails, as it renders learning vulnerable to local optima. We introduce a simple and effective model-free method to learn from shaped distance-to-goal rewards on tasks where success depends on reaching a goal state. Our method introduces an auxiliary distance-based reward based on pairs of rollouts to encourage diverse exploration. This approach effectively prevents learning dynamics from stabilizing around local optima induced by the naive distance-to-goal reward shaping and enables policies to efficiently solve sparse reward tasks. Our augmented objective does not require any additional reward engineering or domain expertise to implement and converges to the original sparse objective as the agent learns to solve the task. We demonstrate that our method successfully solves a variety of hard-exploration tasks (including maze navigation and 3D construction in a Minecraft environment), where naive distance-based reward shaping otherwise fails, and intrinsic curiosity and reward relabeling strategies exhibit poor performance.

name change, self-balancing shaped reward, sparse reward task, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Search on the Replay Buffer: Bridging Planning and Reinforcement Learning

Neural Information Processing SystemsDec-25-2025, 10:51:42 GMT

The history of learning for control has been an exciting back and forth between two broad classes of algorithms: planning and reinforcement learning. Planning algorithms effectively reason over long horizons, but assume access to a local policy and distance metric over collision-free paths. Reinforcement learning excels at learning policies and relative values of states, but fails to plan over long horizons. Despite the successes of each method on various tasks, long horizon, sparse reward tasks with high-dimensional observations remain exceedingly challenging for both planning and reinforcement learning algorithms. Frustratingly, these sorts of tasks are potentially the most useful, as they are simple to design (a human only need to provide an example goal state) and avoid injecting bias through reward shaping.

algorithm, bridging planning and reinforcement learning, replay buffer, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

The history of learning for control has been an exciting back and forth between two broad classes of algorithms: planning and reinforcement learning. Planning algorithms effectively reason over long horizons, but assume access to a local policy and distance metric over collision-free paths. Reinforcement learning excels at learning policies and relative values of states, but fails to plan over long horizons. Despite the successes of each method on various tasks, long horizon, sparse reward tasks with high-dimensional observations remain exceedingly challenging for both planning and reinforcement learning algorithms. Frustratingly, these sorts of tasks are potentially the most useful, as they are simple to design (a human only need to provide an example goal state) and avoid injecting bias through reward shaping.

algorithm, bridging planning and reinforcement learning, replay buffer, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reviews: Language as an Abstraction for Hierarchical Deep Reinforcement Learning

Neural Information Processing SystemsJan-21-2025, 11:23:34 GMT

I believe the proposed method, HAL (Hierarchical Abstraction with Language), is an interesting approach for HRL. The authors adapt Hindsight Experience Replay for instructions (called Hindsight Instruction Relabelling). I have some concerns about the experimental setup and empirical evaluation of the proposed method: - The motivation behind introducing a new environment is unclear. There are a lot of similar existing environments such as crafting environment used by [1], compositional and relational navigation environment in [2]. Introducing a new environment (unless its necessary) hinders proper comparison and benchmarking.

artificial intelligence, hrl method, machine learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards

Neural Information Processing SystemsOct-10-2024, 04:18:41 GMT

While using shaped rewards can be beneficial when solving sparse reward tasks, their successful application often requires careful engineering and is problem specific. For instance, in tasks where the agent must achieve some goal state, simple distance-to-goal reward shaping often fails, as it renders learning vulnerable to local optima. We introduce a simple and effective model-free method to learn from shaped distance-to-goal rewards on tasks where success depends on reaching a goal state. Our method introduces an auxiliary distance-based reward based on pairs of rollouts to encourage diverse exploration. This approach effectively prevents learning dynamics from stabilizing around local optima induced by the naive distance-to-goal reward shaping and enables policies to efficiently solve sparse reward tasks.

distance-to-goal reward, self-balancing shaped reward, sparse reward task, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Search on the Replay Buffer: Bridging Planning and Reinforcement Learning

Neural Information Processing SystemsOct-10-2024, 02:51:01 GMT

The history of learning for control has been an exciting back and forth between two broad classes of algorithms: planning and reinforcement learning. Planning algorithms effectively reason over long horizons, but assume access to a local policy and distance metric over collision-free paths. Reinforcement learning excels at learning policies and relative values of states, but fails to plan over long horizons. Despite the successes of each method on various tasks, long horizon, sparse reward tasks with high-dimensional observations remain exceedingly challenging for both planning and reinforcement learning algorithms. Frustratingly, these sorts of tasks are potentially the most useful, as they are simple to design (a human only need to provide an example goal state) and avoid injecting bias through reward shaping.

algorithm, bridging planning and reinforcement learning, replay buffer, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Filters

Collaborating Authors

sparse reward task

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

d324a0cc02881779dcda44a675fdcaaa-AuthorFeedback.pdf

2b8f621e9244cea5007bac8f5d50e476-AuthorFeedback.pdf

Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards

Search on the Replay Buffer: Bridging Planning and Reinforcement Learning

2b8f621e9244cea5007bac8f5d50e476-AuthorFeedback.pdf

d324a0cc02881779dcda44a675fdcaaa-AuthorFeedback.pdf

Search on the Replay Buffer: Bridging Planning and Reinforcement Learning

Reviews: Language as an Abstraction for Hierarchical Deep Reinforcement Learning

Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards

Search on the Replay Buffer: Bridging Planning and Reinforcement Learning