Can Out-of-Distribution Evaluations Uncover Reliance on Shortcuts? A Case Study in Question Answering

Open in new window