From Canonical to Complex: Benchmarking LLM Capabilities in Undergraduate Thermodynamics