Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?

Open in new window