Towards Multimodal Understanding via Stable Diffusion as a Task-Aware Feature Extractor

Open in new window