Learning from Watching: Scalable Extraction of Manipulation Trajectories from Human Videos