Fusing Cross-Domain Knowledge from Multimodal Data to Solve Problems in the Physical World