RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models