Goto

Collaborating Authors

 Mitchell Gordon


HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models

Neural Information Processing Systems

Generative models often use human evaluations to measure the perceived quality of their outputs. Automated metrics are noisy indirect proxies, because they rely on heuristics or pretrained embeddings. However, up until now, direct human evaluation strategies have been ad-hoc, neither standardized nor validated. Our work establishes a gold standard human benchmark for generative realism.


HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models

Neural Information Processing Systems

Generative models often use human evaluations to measure the perceived quality of their outputs. Automated metrics are noisy indirect proxies, because they rely on heuristics or pretrained embeddings. However, up until now, direct human evaluation strategies have been ad-hoc, neither standardized nor validated. Our work establishes a gold standard human benchmark for generative realism.