Evaluating Multi-Hop Reasoning in Large Language Models: A Chemistry-Centric Case Study

Open in new window