Evaluating the Generalization Capabilities of Large Language Models on Code Reasoning

Open in new window