Learning to Look: Seeking Information for Decision Making via Policy Factorization

Dass, Shivin, Hu, Jiaheng, Abbatematteo, Ben, Stone, Peter, Martín-Martín, Roberto

arXiv.org Artificial Intelligence 

Intelligent decisions can only be made based on the right information. When operating in the environment, an intelligent agent actively seeks the information that enables it to select the right actions and proceeds with the task only when it is confident enough. For example, when following a video recipe, a chef would look at the TV to obtain information about the next ingredient to grasp, and later look at a timer to decide when to turn off the stove. In contrast, current learning robots assume that the information needed for manipulation is readily available in their sensor signals (e.g., from a stationary camera looking at a tabletop manipulation setting) or rely on a given low-dimensional state representation predefined by a human (e.g., object pose) that also has to provide the means for the robot to perceive it. In this work, our goal is to endow robots with the capabilities to learn to perform information-seeking actions to find the information that enables manipulation, using as supervision the quality of the informed actions and switching between active perception and manipulation only based on the uncertainty about what manipulation action should come next. Performing actions to reveal information has been previously explored in the subfields of active and interactive perception. In active perception [1, 2, 3], an agent changes the parameters of its sensors (e.g., camera pose [4, 5, 6] or parameters [7, 8, 9]) to infer information such as object pose, shape, or material. Interactive perception [10] solutions go one step further and enable agents to change the state of the environment to create information-rich signals to perceive kinematics [11, 12], material [13], or other properties [14, 15, 16, 17].