HL Dataset: Visually-grounded Description of Scenes, Actions and Rationales