Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms