Reasoning Models are Test Exploiters: Rethinking Multiple-Choice

Open in new window