By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting