Robust Domain Generalization for Multi-modal Object Recognition