Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases