Embodied Referring Expression Comprehension in Human-Robot Interaction