Test-Time Reasoning Through Visual Human Preferences with VLMs and Soft Rewards

Open in new window