Goto

Collaborating Authors

DeepMind: Why is AI so good at language? It's something in language itself - Nova Languages

#artificialintelligence

How is it that a program such as OpenAI's GPT-3 neural network can answer multiple choice questions, or write a poem in a particular style, despite never being programmed for those specific tasks? It may be because the human language has statistical properties that lead a neural network to expect the unexpected, according to new research by DeepMind, the AI unit of Google. Natural language, when viewed from the point of view of statistics, has qualities that are "non-uniform," such as words that can stand for multiple things, known as "polysemy," like the word "bank," meaning a place where you put money or a rising mound of earth. And words that sound the same can stand for different things, known as homonyms, like "here" and "hear." Those qualities of language are the focus of a paper posted on arXiv this month, "Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers," by DeepMind scientists Stephanie C.Y. Chan, Adam Santoro, Andrew K. Lampinen, Jane X. Wang, Aaditya Singh, Pierre H. Richemond, Jay McClelland, and Felix Hill.


DeepMind: Why is AI so good at language? It's something in language itself

ZDNet

Can the frequency of language, and qualities such as polysemy, affect whether a neural network can suddenly solve tasks for which it was not specifically developed, known as "few-shot learning"? How is it that a program such as OpenAI's GPT-3 neural network can answer multiple choice questions, or write a poem in a particular style, despite never being programmed for those specific tasks? It may be because the human language has statistical properties that lead a neural network to expect the unexpected, according to new research by DeepMind, the AI unit of Google. Natural language, when viewed from the point of view of statistics, has qualities that are "non-uniform," such as words that can stand for multiple things, known as "polysemy," like the word "bank," meaning a place where you put money or a rising mound of earth. And words that sound the same can stand for different things, known as homonyms, like "here" and "hear." Those qualities of language are the focus of a paper posted on arXiv this month, "Data Distributional Properties Drive Emergent Few-Shot Learning in Transformers," by DeepMind scientists Stephanie C.Y. Chan, Adam Santoro, Andrew K. Lampinen, Jane X. Wang, Aaditya Singh, Pierre H. Richemond, Jay McClelland, and Felix Hill.


Why the AGI discussion is getting heated again

#artificialintelligence

And right now, we are in the midst of one of those cycles. Tech entrepreneurs are warning about the alien invasion of AGI. The media is awash with reports of AI systems that are mastering language and moving toward generalization. And social media is filled with heated discussions about deep neural networks and consciousness. Recent years have seen some truly impressive advances in AI, and scientists have been able to make progress in some of the most challenging areas of the field.


Language Models are Few-shot Multilingual Learners

arXiv.org Artificial Intelligence

General-purpose language models have demonstrated impressive capabilities, performing on par with state-of-the-art approaches on a range of downstream natural language processing (NLP) tasks and benchmarks when inferring instructions from very few examples. Here, we evaluate the multilingual skills of the GPT and T5 models in conducting multi-class classification on non-English languages without any parameter updates. We show that, given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones. Finally, we find the in-context few-shot cross-lingual prediction results of language models are significantly better than random prediction, and they are competitive compared to the existing state-of-the-art cross-lingual models.


GPT-3, a Giant Step for Deep Learning and NLP

#artificialintelligence

A few days ago, OpenAI announced a new successor to their Language Model (LM) - GPT-3. This is the largest model trained so far, with 175 billion parameters. While training this large model has its merits, reading a large portion of 72 pages can be tiresome. In this blog post I'll highlight the parts that I find interesting for people familiar with LMs, who merely wish to know (most of) the important points of this work. "The diversity of tasks the model is able to perform in a zero-shot setting suggests that high-capacity models trained to maximize the likelihood of a sufficiently varied text corpus begin to learn how to perform a surprising amount of tasks without the need for explicit supervision" This is an excerpt from the paper accompanying GPT-2.