Cognitive Load Limits in Large Language Models: Benchmarking Multi-Hop Reasoning