Mitigating Spurious Correlations in Multi-modal Models during Fine-tuning