Distilling Implicit Multimodal Knowledge into LLMs for Zero-Resource Dialogue Generation