Cognitive Load Limits in Large Language Models: Benchmarking Multi-Hop Reasoning

Open in new window