Curran
Although we would like our robots to have completely autonomous behavior, this is often not possible. Some parts of a task might be hard to automate, perhaps due to hard-to-interpret sensor information, or a complex environment. In this case, using shared autonomy or teleoperation is preferable to an error-prone autonomous approach. However, the question of which parts of a task to allocate to the human, and which to the robot can often be tricky. In this work, we introduce A3P, a risk-aware task-level reinforcement learning algorithm. A3P represents a task-level state machine as a POMDP. In this paper, we introduce A3P, a risk-aware algorithm that discovers when to hand off subtasks to a human assistant. A3P models the task as a Partially Observably Markov Decision Process (POMDP) and explicitly represents failures as additional state-action pairs. Based on the model, the algorithm allows the user to allocate subtasks the robot or the human in such a way as to manage the worst-case performance time for the overall task.
Feb-8-2022, 10:17:23 GMT