Privileged Sensing Scaffolds Reinforcement Learning
Hu, Edward S., Springer, James, Rybkin, Oleh, Jayaraman, Dinesh
–arXiv.org Artificial Intelligence
We need to look at our shoelaces as we first learn to tie them but having mastered this skill, we can do it from touch alone. We call this phenomenon "sensory scaffolding": observation streams that are not needed by a master might yet aid a novice learner. We consider such sensory scaffolding setups for training artificial agents. For example, a robot arm may need to be deployed with just a low-cost, robust, general-purpose camera; yet its performance may improve by having privileged training-time-only access to informative albeit expensive and unwieldy motion capture rigs or fragile tactile sensors. For these settings, we propose Scaffolder, a reinforcement learning approach which effectively exploits privileged sensing in critics, world models, reward estimators, and other such auxiliary components that are only used at training time, to improve the target policy. For evaluating sensory scaffolding agents, we design a new "S3" suite of ten diverse simulated robotic tasks that explore a wide range of practical sensor setups. Agents must use privileged camera sensing to train blind hurdlers, privileged active visual perception to help robot arms overcome visual occlusions, privileged touch sensors to train robot hands, and more. Scaffolder easily outperforms relevant prior baselines and frequently performs comparably even to policies that have test-time access to the privileged sensors. It is well-known that Beethoven composed symphonies long after he had fully lost his hearing. Such feats are commonly held to be evidence of mastery: for example, novice typists need to look at the keyboard to locate keys but with practice, can graduate to typing without looking. Thus, sensing requirements may be different during learning versus after learning. We refer to this as "sensory scaffolding", drawing inspiration from the concept of scaffolding teaching mechanisms in psychology that provide temporary support for a student (Wood et al., 1976; Vygotsky et al., 2011), like training wheels when learning to ride a bicycle. For artificial learning agents such as robots, sensory scaffolding permits decoupling the observation streams required at test time from those that are used to train the agent. The sensors available in a deployed robot are often decided by practical considerations such as cost, robustness, size, compute requirements, and ease of instrumentation, e.g., autonomous cars with only cheap and robust RGB camera sensors. However, those considerations might carry less weight at training time, so a robot learning practitioner may choose to scaffold policy learning with privileged information (Vapnik & Vashist, 2009) from extra sensors available only at training. In the case of the cars above, the manufacturer might equip a small fleet of training cars with expensive privileged sensors like lidar to improve RGB-only driving policies for customers to install in their cars.
arXiv.org Artificial Intelligence
May-23-2024