Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries