HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models
Zhou, Sharon, Gordon, Mitchell, Krishna, Ranjay, Narcomey, Austin, Fei-Fei, Li F., Bernstein, Michael
–Neural Information Processing Systems
Generative models often use human evaluations to measure the perceived quality of their outputs. Automated metrics are noisy indirect proxies, because they rely on heuristics or pretrained embeddings. However, up until now, direct human evaluation strategies have been ad-hoc, neither standardized nor validated. Our work establishes a gold standard human benchmark for generative realism. We construct Human eYe Perceptual Evaluation (HYPE) a human benchmark that is (1) grounded in psychophysics research in perception, (2) reliable across different sets of randomly sampled outputs from a model, (3) able to produce separable model performances, and (4) efficient in cost and time.
Neural Information Processing Systems
Mar-18-2020, 21:48:13 GMT
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (0.72)
- Natural Language > Generation (0.64)
- Vision (0.78)
- Information Technology > Artificial Intelligence