SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Wang, Alex, Pruksachatkun, Yada, Nangia, Nikita, Singh, Amanpreet, Michael, Julian, Hill, Felix, Levy, Omer, Bowman, Samuel R.
–arXiv.org Artificial Intelligence
In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks. The GLUE benchmark, introduced one year ago, offers a single-number metric that summarizes progress on a diverse set of such tasks, but performance on the benchmark has recently come close to the level of non-expert humans, suggesting limited headroom for further research. This paper recaps lessons learned from the GLUE benchmark and presents SuperGLUE, a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, improved resources, and a new public leaderboard. SuperGLUE will be available soon at super.gluebenchmark.com.
arXiv.org Artificial Intelligence
May-1-2019
- Country:
- Asia > Middle East
- Qatar (0.14)
- Europe (1.00)
- North America > United States
- California (0.14)
- Louisiana (0.14)
- Texas (0.14)
- Asia > Middle East
- Genre:
- Personal > Honors (0.46)
- Research Report (1.00)
- Industry:
- Government > Regional Government (0.46)
- Health & Medicine > Therapeutic Area (0.68)
- Technology: