Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?