Advances in artificial intelligence depend on continual testing of massive amounts of data. This benchmark testing allows researchers to determine how "intelligent" AI is, spot weaknesses and then develop stronger, smarter models. The process, however, is time-consuming. When an AI system tackles a series of computer-generated tasks and eventually reaches peak performance, researchers must go back to the drawing board and design newer, more complex projects to further bolster AI's performance. Facebook announced this week it has found a better tool to undertake this task--people.
Benchmarking is a crucial step in developing ever more sophisticated artificial intelligence. It provides a helpful abstraction of the AI's capabilities and allows researchers a firm sense of how well the system is performing on specific tasks. But they are not without their drawbacks. Once an algorithm masters the static dataset from a given benchmark, researchers have to undertake the time-consuming process of developing a new one to further improve the AI. As AIs have improved over time, researchers have had to build new benchmarks with increasing frequency.
Facebook's artificial intelligence researchers have a plan to make algorithms smarter by exposing them to human cunning. They want your help to supply the trickery. Thursday, Facebook's AI lab launched a project called Dynabench that creates a kind of gladiatorial arena in which humans try to trip up AI systems. Challenges include crafting sentences that cause a sentiment-scoring system to misfire, reading a comment as negative when it is actually positive, for example. Another involves tricking a hate speech filter--a potential draw for teens and trolls.
We've built and are now sharing Dynabench, a first-of-its-kind platform for dynamic data collection and benchmarking in artificial intelligence. It uses both humans and models together "in the loop" to create challenging new data sets that will lead to better, more flexible AI. Dynabench radically rethinks AI benchmarking, using a novel procedure called dynamic adversarial data collection to evaluate AI models. It measures how easily AI systems are fooled by humans, which is a better indicator of a model's quality than current static benchmarks provide. Ultimately, this metric will better reflect the performance of AI models in the circumstances that matter most: when interacting with people, who behave and react in complex, changing ways that can't be reflected in a fixed set of data points.
"Unless you have confidence in the ruler's reliability, if you use a ruler to measure a table, you may also be using the table to measure the ruler." Do machine learning researchers solve something huge every time they hit the benchmark? If not, then why do we have these benchmarks? But, if the benchmark is breached every couple of months then research objectives might become more about chasing benchmarks than solving bigger problems. In order to address these challenges, researchers at Facebook AI have introduced Dynabench, a new platform for dynamic data collection and benchmarking.