Large-scale cloze evaluation reveals that token prediction tasks are neither lexically nor semantically aligned

Open in new window