VTON-VLLM: Aligning Virtual Try-On Models with Human Preferences
–Neural Information Processing Systems
Diffusion models have yielded remarkable success in virtual try-on (VTON) task, yet they often fall short of fully meeting user expectations regarding visual quality and detail preservation. To alleviate this issue, we curate a dataset of synthesized VTON images annotated with human judgments across multiple perceptual criteria. A vision large language model (VLLM), namely VTON-VLLM, is then learnt on these annotations. VTON-VLLM functions as a unified ``fashion expert'' and is capable of both evaluating and steering VTON synthesis towards human preferences.
Neural Information Processing Systems
Jun-14-2026, 05:36:50 GMT
- Technology: