VTON-VLLM: Aligning Virtual Try-On Models with Human Preferences

Jun-14-2026, 05:36:50 GMT–Neural Information Processing Systems

Diffusion models have yielded remarkable success in virtual try-on (VTON) task, yet they often fall short of fully meeting user expectations regarding visual quality and detail preservation. To alleviate this issue, we curate a dataset of synthesized VTON images annotated with human judgments across multiple perceptual criteria. A vision large language model (VLLM), namely VTON-VLLM, is then learnt on these annotations. VTON-VLLM functions as a unified ``fashion expert'' and is capable of both evaluating and steering VTON synthesis towards human preferences.

artificial intelligence, machine learning, natural language, (6 more...)

Neural Information Processing Systems

Jun-14-2026, 05:36:50 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (0.60)
  - Machine Learning (0.40)