ASR-enhanced Multimodal Representation Learning for Cross-Domain Product Retrieval