Show and Guide: Instructional-Plan Grounded Vision and Language Model