Large Language Model
A breakthrough unfolds – DeepMind: The Podcast (Season 2, Episode 1)
In December 2019, DeepMind's AI system, AlphaFold, solved a 50-year-old grand challenge in biology, known as the protein-folding problem. A headline in the journal Nature read, "It will change everything" and the President of the UK's Royal Society called it a "stunning advance [that arrived] decades before many in the field would have predicted". In this episode, Hannah uncovers the inside story of AlphaFold from the people who made it happen and finds out how it could help transform the future of healthcare and medicine. Thank you to everyone who made this season possible! Find Seasons 1 & 2 on YouTube: http://dpmd.ai/3geDPmL
281 years in the making: GPT-3 and de la Mettrie
Today two books came ended up on my desk, one published last year (experimentation guide with a git-hub example library for GPT-3) and one published in 1741 (Man a Machine, By Julien Offray de la Mettrie). So I did what anyone having these two books in front them would do; I let GPT-3 write a sample of how it would have completed de la Mettrie's work. Although de la Mettrie never spoke specifically about technology; being an engineer and a physician, standing in the middle of the rise of the industrial age must have prompted him to write this brilliant (although sometimes very fragmented) essay. It was a piece of work which deviated from litterary work of its time. It had a bold approach in describing the body as a singular system (instead of a mare box used by divine powers) and the mind as a computing machine of which consciousness arose (and not a parallel entity belonging to a spiritual world). It is written from a naturalist perspective, a strong statement against dualism and spiritualism.
Chain of Thought Prompting Elicits Reasoning in Large Language Models
Wei, Jason, Wang, Xuezhi, Schuurmans, Dale, Bosma, Maarten, Chi, Ed, Le, Quoc, Zhou, Denny
Although scaling up language model size has reliably improved performance on a range of NLP tasks, even the largest models currently struggle with certain reasoning tasks such as math word problems, symbolic manipulation, and commonsense reasoning. This paper explores the ability of language models to generate a coherent chain of thought -- a series of short sentences that mimic the reasoning process a person might have when responding to a question. Experiments show that inducing a chain of thought via prompting can enable sufficiently large language models to better perform reasoning tasks that otherwise have flat scaling curves.
How Robust are Discriminatively Trained Zero-Shot Learning Models?
Yucel, Mehmet Kerim, Cinbis, Ramazan Gokberk, Duygulu, Pinar
Data shift robustness has been primarily investigated from a fully supervised perspective, and robustness of zero-shot learning (ZSL) models have been largely neglected. In this paper, we present novel analyses on the robustness of discriminative ZSL to image corruptions. We subject several ZSL models to a large set of common corruptions and defenses. In order to realize the corruption analysis, we curate and release the first ZSL corruption robustness datasets SUN-C, CUB-C and AWA2-C. We analyse our results by taking into account the dataset characteristics, class imbalance, class transitions between seen and unseen classes and the discrepancies between ZSL and GZSL performances. Our results show that discriminative ZSL suffers from corruptions and this trend is further exacerbated by the severe class imbalance and model weakness inherent in ZSL methods. We then combine our findings with those based on adversarial attacks in ZSL, and highlight the different effects of corruptions and adversarial examples, such as the pseudo-robustness effect present under adversarial attacks. We also obtain new strong baselines for both models with the defense methods. Finally, our experiments show that although existing methods to improve robustness somewhat work for ZSL models, they do not produce a tangible effect.
Global Big Data Conference
OpenAI's impressive AI language model GPT-3 has plenty of things going it, but with 175 billion parameters no one would claim it's particularly streamlined. The Allen Institute for AI (AI2) has demonstrated a model that performs as well or better than GPT-3 on answering questions, but is a tenth the size. Macaw, AI2's model, emerged from research being done at the nonprofit into creating an AI that performs at human levels on standardized tests. "After we got a very high score they moved on to harder questions," said AI2 head Oren Etzioni. "There's this paradox where sometimes the questions that are easiest for people are the hardest for machines -- and the biggest gap was in common sense." For instance, he said, asking "When did Tom Hanks land on the moon?" GPT-3 says 1995, since that's when the film Apollo 13 came out.
DNNFuser: Generative Pre-Trained Transformer as a Generalized Mapper for Layer Fusion in DNN Accelerators
Kao, Sheng-Chun, Huang, Xiaoyu, Krishna, Tushar
Dataflow/mapping decides the compute and energy efficiency of DNN accelerators. Many mappers have been proposed to tackle the intra-layer map-space. However, mappers for inter-layer map-space (aka layer-fusion map-space), have been rarely discussed. In this work, we propose a mapper, DNNFuser, specifically focusing on this layer-fusion map-space. While existing SOTA DNN mapping explorations rely on search-based mappers, this is the first work, to the best of our knowledge, to propose a one-shot inference-based mapper. We leverage a famous language model GPT as our DNN architecture to learn layer-fusion optimization as a sequence modeling problem. Further, the trained DNNFuser can generalize its knowledge and infer new solutions for unseen conditions. Within one inference pass, DNNFuser can infer solutions with compatible performance to the ones found by a highly optimized search-based mapper while being 66x-127x faster.
Start of the European AI language model project Open GPT-X
Under the leadership of the Fraunhofer Institutes for Intelligent Analysis and Information Systems (IAIS) and for Integrated Circuits (IIS), the OpenGPT-X project is starting with the goal of developing a large AI language model for Europe. Particular attention is being paid to data protection as well as European language diversity. "International competitors have already recognized the enormous disruptive potential of AI language technologies for business, industry and society. A European AI language model like OpenGPT-X is therefore imperative to ensure Europe's digital sovereignty and market independence," says Dr. Nicolas Flores-Herr, head of the project at Fraunhofer IAIS. Due to the high technical requirements, such as computing power, such powerful language models can so far only be implemented by large companies or consortia.
AI2 shows off an open, Q&A-focused rival to GPT3 – TechCrunch
OpenAI's impressive AI language model GPT-3 has plenty of things going it, but with 175 billion parameters no one would claim it's particularly streamlined. The Allen Institute for AI (AI2) has demonstrated a model that performs as well or better than GPT-3 on answering questions, but is a tenth the size. Macaw, AI2's model, emerged from research being done at the nonprofit into creating an AI that performs at human levels on standardized tests. "After we got a very high score they moved on to harder questions," said AI2 head Oren Etzioni. "There's this paradox where sometimes the questions that are easiest for people are the hardest for machines -- and the biggest gap was in common sense."
ML and NLP Research Highlights of 2021
In this post, I will cover the papers and research areas that I found most inspiring. I tried to cover the papers that I was aware of but likely missed many relevant ones. Feel free to highlight them as well as ones that you found inspiring in the comments. Pre-trained models were applied in many different domains and started to be considered critical for ML research [1]. In computer vision, supervised pre-trained models such as Vision Transformer [2] have been scaled up [3] and self-supervised pre-trained models have started to match their performance [4]. The latter have been scaled beyond the controlled environment of ImageNet to random collections of images [5]. In speech, new models have been built based on wav2vec 2.0 [6] such as W2v-BERT [7] as well as more powerful multilingual models such as XLS-R [8]. At the same time, we saw new unified pre-trained models for previously under-researched modality pairs such as for videos and language [9] as well as speech and language [10]. In vision and language, controlled studies shed new light on important components of such multi-modal models [11][12].
AI models are becoming better at answering questions, but they're not perfect
Did you miss a session from the Future of Work Summit? Let the OSS Enterprise newsletter guide your open source journey! Late last year, the Allen Institute for AI, the research institute founded by the late Microsoft cofounder Paul Allen, quietly open-sourced a large AI language model called Macaw. Unlike other language models that've captured the public's attention recently (see OpenAI's GPT-3), Macaw is fairly limited in what it can do, only answering and generating questions. But the researchers behind Macaw claim that it can outperform GPT-3 on a set of questions, despite being an order of magnitude smaller.