Multi-modal Visual Understanding with Prompts for Semantic Information Disentanglement of Image

Open in new window