ORCHARD: A Benchmark For Measuring Systematic Generalization of Multi-Hierarchical Reasoning

Open in new window