VTON-VLLM: Aligning Virtual Try-On Models with Human Preferences
–Neural Information Processing Systems
Diffusion models have yielded remarkable success in virtual try-on (VTON) task, yet they often fall short of fully meeting user expectations regarding visual quality and detail preservation. To alleviate this issue, we curate a dataset of synthesized VTON images annotated with human judgments across multiple perceptual criteria. A vision large language model (VLLM), namely VTON-VLLM, is then learnt on these annotations. VTON-VLLM functions as a unified "fashion expert" and is capable of both evaluating and steering VTON synthesis towards human preferences.
Neural Information Processing Systems
Jun-22-2026, 22:26:51 GMT
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Information Technology (0.94)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Machine Learning > Neural Networks (1.00)
- Natural Language > Large Language Model (0.67)
- Information Technology > Artificial Intelligence