a new benchmark for evaluating general-purpose NLU systems, which is necessary given the saturation of the GLUE
–Neural Information Processing Systems
We thank all the reviewers for their time and comments. Our work builds directly on GLUE and maintains the same general structure. Our benchmark does have a less uniform API than GLUE, but we view this as both a pro and a con. WSC is a coreference task but is designed to require commonsense reasoning to solve. COP A explicitly tests systems' causal reasoning ability (somewhat related to commonsense reasoning).
Neural Information Processing Systems
Nov-16-2025, 06:13:43 GMT
- Technology: