Bridging Bots: from Perception to Action via Multimodal-LMs and Knowledge Graphs