Can large language models reason about medical questions?