Introducing Dynabench: Rethinking the way we benchmark AI


We've built and are now sharing Dynabench, a first-of-its-kind platform for dynamic data collection and benchmarking in artificial intelligence. It uses both humans and models together "in the loop" to create challenging new data sets that will lead to better, more flexible AI. Dynabench radically rethinks AI benchmarking, using a novel procedure called dynamic adversarial data collection to evaluate AI models. It measures how easily AI systems are fooled by humans, which is a better indicator of a model's quality than current static benchmarks provide. Ultimately, this metric will better reflect the performance of AI models in the circumstances that matter most: when interacting with people, who behave and react in complex, changing ways that can't be reflected in a fixed set of data points.