"Unless you have confidence in the ruler's reliability, if you use a ruler to measure a table, you may also be using the table to measure the ruler." Do machine learning researchers solve something huge every time they hit the benchmark? If not, then why do we have these benchmarks? But, if the benchmark is breached every couple of months then research objectives might become more about chasing benchmarks than solving bigger problems. In order to address these challenges, researchers at Facebook AI have introduced Dynabench, a new platform for dynamic data collection and benchmarking.
In the ever-expanding world of computer hardware and software, benchmarks provide a robust method for comparing quality and performance across different system architectures. From MNIST to ImageNet to GLUE, benchmarks have also come to play a hugely important role in driving and measuring progress in AI research. When introducing any new benchmark, it's generally best not to make it so easy that it will quickly become outdated, or so hard that everyone will simply fail. When new models bury benchmarks, which is happening faster and faster in AI these days, researchers must engage in the time-consuming work of making new ones. Facebook believes that the increasing benchmark saturation in recent years -- especially in natural language processing (NLP) -- means it's time to "radically rethink the way AI researchers do benchmarking and to break free of the limitations of static benchmarks." Their solution is a new research platform for dynamic data collection and benchmarking called Dynabench, which they propose will offer a more accurate and sustainable way for evaluating progress in AI.
Benchmarking is a crucial step in developing ever more sophisticated artificial intelligence. It provides a helpful abstraction of the AI's capabilities and allows researchers a firm sense of how well the system is performing on specific tasks. But they are not without their drawbacks. Once an algorithm masters the static dataset from a given benchmark, researchers have to undertake the time-consuming process of developing a new one to further improve the AI. As AIs have improved over time, researchers have had to build new benchmarks with increasing frequency.
Advances in artificial intelligence depend on continual testing of massive amounts of data. This benchmark testing allows researchers to determine how "intelligent" AI is, spot weaknesses and then develop stronger, smarter models. The process, however, is time-consuming. When an AI system tackles a series of computer-generated tasks and eventually reaches peak performance, researchers must go back to the drawing board and design newer, more complex projects to further bolster AI's performance. Facebook announced this week it has found a better tool to undertake this task--people.
Benchmarks can be very misleading, says Douwe Kiela at Facebook AI Research, who led the team behind the tool. Focusing too much on benchmarks can mean losing sight of wider goals. The test can become the task. "You end up with a system that is better at the test than humans are but not better at the overall task," he says. "It's very deceiving, because it makes it look like we're much further than we actually are."