Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data

Open in new window