Leveraging Fully Observable Policies for Learning under Partial Observability
Nguyen, Hai, Baisero, Andrea, Wang, Dian, Amato, Christopher, Platt, Robert
–arXiv.org Artificial Intelligence
In contrast, the setting of fully observable (FO) control has featured the success of many powerful reinforcement learning (RL) algorithms (e.g., [8, 9, 10, 11]). Unfortunately, full observability only holds for a small portion of realistic robotics problems. Figure 1: To reach the In this work, we attempt to leverage good fully observable policies (state correct goal object, a experts) available only during offline training to help train PO policies state expert takes the that can execute online. We rely on the setting of offline training and red path directly, while online execution, a successful RL framework where an agent can use a partially observable "privileged" information such as the state [12, 13, 14, 15] or the belief agent must first take the about the state [6] during offline training, e.g., from simulators, to efficiently green path to identify learn PO policies that are later can be deployed without the access the correct goal object, to the privileged information anymore. In this work, the privileged information then take the red path. is not just the state itself but also the state expert. Our setting can be illustrated in a navigation task (Figure 1), which requires an agent to navigate to an unknown goal object on the right, identifiable by an object on the left side. While the optimal behavior under partial observability is to first navigate leftwards to identify the goal object, the state expert is able to move to the goal object directly. Despite being sup-optimal from the PO perspective, the state expert can provide experience during training leading to the goal object, which is potentially useful for both exploration and as a part of the policy needed in the PO case after the goal object is identified.
arXiv.org Artificial Intelligence
Nov-10-2022
- Country:
- Oceania > New Zealand
- North Island > Auckland Region > Auckland (0.04)
- North America > United States
- Massachusetts > Suffolk County > Boston (0.04)
- Asia > Middle East
- Jordan (0.04)
- Oceania > New Zealand
- Genre:
- Research Report (0.82)
- Technology: