Out-of-distribution generalisation is hard: evidence from ARC-like tasks

Open in new window