In the ever-expanding world of computer hardware and software, benchmarks provide a robust method for comparing quality and performance across different system architectures. From MNIST to ImageNet to GLUE, benchmarks have also come to play a hugely important role in driving and measuring progress in AI research. When introducing any new benchmark, it's generally best not to make it so easy that it will quickly become outdated, or so hard that everyone will simply fail. When new models bury benchmarks, which is happening faster and faster in AI these days, researchers must engage in the time-consuming work of making new ones. Facebook believes that the increasing benchmark saturation in recent years -- especially in natural language processing (NLP) -- means it's time to "radically rethink the way AI researchers do benchmarking and to break free of the limitations of static benchmarks." Their solution is a new research platform for dynamic data collection and benchmarking called Dynabench, which they propose will offer a more accurate and sustainable way for evaluating progress in AI.
Oct-1-2020, 01:25:08 GMT