Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards
–Neural Information Processing Systems
Recent work demonstrated that using a memory buffer of previous successful trajectories can result in more effective policies. However, existing methods may overly exploit past successful experiences, which can encourage the agent to adopt sub-optimal and myopic behaviors.
Neural Information Processing Systems
Nov-13-2025, 16:06:35 GMT
- Country:
- North America
- Canada (0.04)
- United States > Michigan (0.04)
- North America
- Industry:
- Leisure & Entertainment > Games (0.47)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning
- Neural Networks (0.68)
- Reinforcement Learning (0.96)
- Natural Language (1.00)
- Representation & Reasoning (1.00)
- Robots (1.00)
- Machine Learning
- Information Technology > Artificial Intelligence