Coherence-Driven Multimodal Safety Dialogue with Active Learning for Embodied Agents