Test-Time Reasoning Through Visual Human Preferences with VLMs and Soft Rewards

Gambashidze, Alexander, Sobolev, Konstantin, Kuznetsov, Andrey, Oseledets, Ivan

Mar-25-2025–arXiv.org Artificial Intelligence

Can Visual Language Models (VLMs) effectively capture huma n visual preferences? This work addresses this question by training VLMs to think about preferences at test time, employing reinforcement learnin g methods inspired by DeepSeek R1 and OpenAI O1. Using datasets such as ImageRewar d and Human Preference Score v2 (HPSv2), our models achieve accurac ies of 64.9% on the ImageReward test set (trained on ImageReward official sp lit) and 65.4% on HPSv2 (trained on approximately 25% of its data). These resu lts match traditional encoder-based models while providing transparent r easoning and enhanced generalization. This approach allows to use not only rich VL M world knowledge, but also its potential to think, yielding interpretable out comes that help decision-making processes. By demonstrating that human visual prefe rences reasonable by current VLMs, we introduce efficient soft-reward strateg ies for image ranking, outperforming simplistic selection or scoring methods. Th is reasoning capability enables VLMs to rank arbitrary images--regardless of aspect ratio or complexity--thereby potentially amplifying the effectiveness of v isual Preference Optimization. By reducing the need for extensive markup while im proving reward generalization and explainability, our findings can be a str ong mile-stone that will enhance text-to-vision models even further.

arxiv preprint arxiv, large language model, machine learning, (14 more...)

arXiv.org Artificial Intelligence

Mar-25-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.06)

Genre:
- Research Report > New Finding (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.74)
  - Machine Learning > Neural Networks
    - Deep Learning > Generative AI (0.37)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found