CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
–Neural Information Processing Systems
Diffusion models have demonstrated great success in the field of text-to-image generation. However, alleviating the misalignment between the text prompts and images is still challenging. We break down the problem into two causes: concept ignorance and concept mismapping. To tackle the two challenges, we propose CoMat, an end-to-end diffusion model fine-tuning strategy with the image-to-text concept matching mechanism. Firstly, we introduce a novel image-to-text concept activation module to guide the diffusion model in revisiting ignored concepts.
Neural Information Processing Systems
May-31-2025, 04:33:06 GMT
- Technology: