Fact or Guesswork? Evaluating Large Language Model's Medical Knowledge with Structured One-Hop Judgment

Open in new window