IDMR: Towards Instance-Driven Precise Visual Correspondence in Multimodal Retrieval