Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters

Open in new window