Breadcrumbs to the Goal: Goal-Conditioned Exploration from Human-in-the-Loop Feedback Marcel T orne 1,2 Max Balsells 3 Zihan Wang
–Neural Information Processing Systems
This procedure can leverage noisy, asynchronous human feedback to learn policies with no hand-crafted reward design or exploration bonuses.
Neural Information Processing Systems
Feb-17-2026, 01:22:27 GMT
- Country:
- Asia
- China > Hong Kong (0.04)
- India (0.04)
- Japan > Honshū
- Kansai > Osaka Prefecture > Osaka (0.04)
- Middle East
- Russia (0.04)
- Singapore > Central Region
- Singapore (0.04)
- South Korea (0.04)
- Europe
- North America
- Canada
- British Columbia > Vancouver (0.04)
- Quebec > Montreal (0.04)
- Mexico (0.04)
- Puerto Rico (0.04)
- United States
- California > Los Angeles County
- Long Beach (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Oregon (0.04)
- California > Los Angeles County
- Canada
- Oceania > Australia
- New South Wales > Sydney (0.04)
- South America
- Argentina > Pampas
- Buenos Aires F.D. > Buenos Aires (0.04)
- Colombia (0.04)
- Uruguay (0.04)
- Argentina > Pampas
- Asia
- Genre:
- Instructional Material > Course Syllabus & Notes (0.67)
- Research Report > New Finding (0.46)
- Industry:
- Education (0.92)
- Technology: