When All Options Are Wrong: Evaluating Large Language Model Robustness with Incorrect Multiple-Choice Options

Open in new window