Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation
Samuel Ainsworth, Matt Barnes, Siddhartha Srinivasa
–Neural Information Processing Systems
In this paper, we consider the problem of determining when along a training roll-out feedback from the environment is no longer beneficial, and an intervention such as resetting the agent to the initial state distribution is warranted. We show that such interventions can naturally trade off a small sub-optimality gap for a dramatic decrease in sample complexity. In particular, we focus on the reinforcement learning setting in which the agent has access to a reward signal in addition to either (a) an expert supervisor triggering the e-stop mechanism in real-time or (b) expert state-only demonstrations used to "learn" an automatic e-stop trigger.
Neural Information Processing Systems
Oct-3-2025, 06:32:14 GMT
- Country:
- North America
- Canada (0.04)
- United States (0.28)
- North America
- Genre:
- Research Report > New Finding (0.46)
- Technology: