Towards Understanding Language through Perception in Situated Human-Robot Interaction: From Word Grounding to Grammar Induction