Systematic Diagnosis of Brittle Reasoning in Large Language Models