A Discussion of the Benchmark Selection for Evaluation

Feb-7-2026, 13:05:34 GMT–Neural Information Processing Systems

Given that NeSS achieves impressive results on synthetic natural language benchmarks in our evaluation, one question is whether it could also improve the performance on commonly used natural language datasets, e.g., large-scale machine translation benchmarks. However, note that most existing natural language benchmarks are not designed for evaluating the compositional generalization performance of models. Instead, the main challenge of those datasets is to handle the inherently ambiguous and potentially noisy natural language inputs. Specifically, their training and test sets are usually from the same distribution, and thus do not evaluate compositional generalization. As a result, we did not run experiments on these datasets.

machine learning, natural language, predictor, (18 more...)

Neural Information Processing Systems

Feb-7-2026, 13:05:34 GMT

Conferences PDF

Add feedback

Genre:
- Workflow (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.43)
  - Natural Language > Grammars & Parsing (0.34)

Duplicate Docs Excel Report

Title
12b1e42dc0746f22cf361267de07073f-Supplemental.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found