6 SupplementaryMaterial
–Neural Information Processing Systems
The original CLUTRR data generation framework made sure that each testproof is not in the training set in order to test whether a model is able to generalize to unseen proofs. Initial results on the original CLUTRR test sets resulted in strong model performance ( 99%) on levels seen during training (2, 4, 6) but no generalization at all ( 0%) to other levels. The models are given as input " [story] [query] " and asked to generate the proof and answer. Models are trained on levels2,4,6only. In our case, the entity names are important to evaluate systematic generalization.
Neural Information Processing Systems
Feb-19-2026, 09:45:27 GMT
- Technology: