Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations

Open in new window