Lenz, Barak
Human or Not? A Gamified Approach to the Turing Test
Jannai, Daniel, Meron, Amos, Lenz, Barak, Levine, Yoav, Shoham, Yoav
"I believe that in 50 years' time it will be possible to make computers play the imitation game so well that an average interrogator will have no more than 70% chance of making the right identification after 5 minutes of questioning." Over the course of a month, the game was played by over 1.5 million users who engaged in anonymous two-minute chat sessions with either another human or an AI language model which was prompted to behave like humans. The task of the players was to correctly guess whether they spoke to a person or to an AI. This largest scale Turing-style test conducted to date revealed some interesting facts. For example, overall users guessed the identity of their partners correctly in only 68% of the games. In the subset of the games in which users faced an AI bot, users had even lower correct guess rates of 60% (that is, not much higher than chance). While this experiment calls for many extensions and refinements, these findings already begin to shed light on the inevitable near future which will commingle humans and AI. The famous Turing test, originally proposed by Alan Turing in 1950 as "the imitation game" (Turing, 1950), was proposed as an operational test of intelligence, namely, testing a machine's ability to exhibit behavior indistinguishable from that of a human. In this proposed test, a human evaluator engages in a natural language conversation with both another human and a machine, and tries to distinguish between them. If the evaluator is unable to tell which is which, the machine is said to have passed the test.
PMI-Masking: Principled masking of correlated spans
Levine, Yoav, Lenz, Barak, Lieber, Opher, Abend, Omri, Leyton-Brown, Kevin, Tennenholtz, Moshe, Shoham, Yoav
Masking tokens uniformly at random constitutes a common flaw in the pretraining of Masked Language Models (MLMs) such as BERT. We show that such uniform masking allows an MLM to minimize its training objective by latching onto shallow local signals, leading to pretraining inefficiency and suboptimal downstream performance. To address this flaw, we propose PMI-Masking, a principled masking strategy based on the concept of Pointwise Mutual Information (PMI), which jointly masks a token n-gram if it exhibits high collocation over the corpus. PMI-Masking motivates, unifies, and improves upon prior more heuristic approaches that attempt to address the drawback of random uniform token masking, such as whole-word masking, entity/phrase masking, and random-span masking. Specifically, we show experimentally that PMI-Masking reaches the performance of prior masking approaches in half the training time, and consistently improves performance at the end of training. In the couple of years since BERT was introduced in a seminal paper by Devlin et al. (2019a), Masked Language Models (MLMs) have rapidly advanced the NLP frontier (Sun et al., 2019; Liu et al., 2019; Joshi et al., 2020; Raffel et al., 2019). At the heart of the MLM approach is the task of predicting a masked subset of the text given the remaining, unmasked text. The text itself is broken up into tokens, each token consisting of a word or part of a word; thus "chair" constitutes a single token, but out-of-vocabulary words like "eigen-value" are broken up into several sub-word tokens.