Selecting Fine-Tuning Examples by Quizzing VLMs

Nov-18-2025–arXiv.org Artificial Intelligence

A challenge in fine-tuning text-to-image diffusion models for specific topics is to select good examples. Fine-tuning from image sets of varying quality, such as Wikipedia Commons, will often produce poor output. However, training images that \textit{do} exemplify the target concept (e.g., a \textit{female Mountain Bluebird}) help ensure that the generated images are similarly representative (e.g., have the prototypical blue-wings and gray chest). In this work, we propose QZLoRA, a framework to select images for low-rank adaptation (LoRA). The approach leverages QuizRank, a method to automatically rank images by treating them as an `educational intervention' and `quizzing' a VLM. We demonstrate that QZLoRA can produce better aligned, photorealistic images with fewer samples. We also show that these fine-tuned models can produce stylized that are similarly representative (i.e., illustrations). Our results highlight the promise of combining automated visual reasoning with parameter-efficient fine-tuning for topic-adaptive generative modeling.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Nov-18-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.88)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Artificial Intelligence
    - Vision (1.00)
    - Natural Language > Large Language Model (0.46)
    - Machine Learning > Neural Networks
      - Deep Learning (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found