Review for NeurIPS paper: Language Models are Few-Shot Learners

Neural Information Processing Systems 

Strengths: The paper in one of these research works that are simple conceptually (training a very large language model at scale) yet ground-breaking (redefines what we thought was possible). The amount of work behind this is enormous and the combination of simplicity, strong engineering work and new discovery makes it a very enjoyable paper to read. I have of course particularly enjoyed reading the part on the distinction of zero-/one-/few-shot learning and seeing the incredible capacity of the GPT-3 model. The fact that a very big neural net can perform a language task without any finetuning is definitely novel and in my opinion unforeseen. This takes us much closer to a system capable of performing multiple tasks at once with little to no supervision - as humans - and reveals a hint of what will be possible in the *near* future with large-scale self-supervised techniques, possibly combined with multiple modalities.