Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations