From General Reasoning to Domain Expertise: Uncovering the Limits of Generalization in Large Language Models