model on any particular supervised task). We compared with GPT-2 (345M) on the Winograd Schema Challenge