The Turking Test: Can Language Models Understand Instructions?