The Reliability Paradox: Exploring How Shortcut Learning Undermines Language Model Calibration

Open in new window