Let's start with the basics. GPT-3 stands for Generative Pretrained Transformer version 3, and it is a sequence transduction model. Simply put, sequence transduction is a technique that transforms an input sequence to an output sequence. GPT-3 is a language model, which means that, using sequence transduction, it can predict the likelihood of an output sequence given an input sequence. This can be used, for instance to predict which word makes the most sense given a text sequence.
I love being a data scientist working in Natural Language Processing (NLP) right now. The breakthroughs and developments are occurring at an unprecedented pace. From the super-efficient ULMFiT framework to Google's BERT, NLP is truly in the midst of a golden era. And at the heart of this revolution is the concept of the Transformer. This has transformed the way we data scientists work with text data – and you'll soon see how in this article.
The sentence you just read wasn't written by me, the author of this article, nor was it written by the editor. No. What you just read was written entirely by OpenAI's GPT-2 language model, prompted only with the word "Today". Apart from another fancy acronym, GPT-2 brought along somewhat coherent (semantically, at least) language generation capabilities, some semblance of hope for zero-shot transfer learning, and a transformer network trained with approximately 1.5 billion parameters on a text corpus with over 40 gigabytes of internet wisdom. In this post, I'm not going to talk about better language models and their implications. As the great Stan Lee once said, "nuff said" about that.
Training Optimus Prime, M.D.: Generating Medical Certification Items by Fine-Tuning OpenAI's gpt2 Transformer Model Matthias von Davier August 21st, 2019 Abstract Objective: Showcasing Artificial Intelligence, in particular deep neural networks, for language modeling aimed at automated generation of medical education test items. Materials and Methods: OpenAI's gpt2 transformer language model was retrained using PubMed's open access text mining database. The retraining was done using toolkits based on tensorflow-gpu available on GitHub, using a workstation equipped with two GPUs. Results: In comparison to a study that used character based recurrent neural networks trained on open access items, the retrained transformer architecture allows generating higher quality text that can be used as draft input for medical education assessment material. In addition, prompted text generation can be used for production of distractors suitable for multiple choice items used in certification exams. Discussion: The current state of neural network based language models can be used to develop tools in supprt of authoring medical education exams using retrained models on the basis of corpora consisting of general medical text collections. Conclusion: Future experiments with more recent transformer models (such as Grover, TransformerXL) using existing medical certification exam item pools is expected to further improve results and facilitate the development of assessment materials. Objective The aim of this article is to provide evidence on the current state of automated item generation (AIG) using deep neural networks (DNNs). Based on earlier work, a first paper that tackled this issue used character-based Address for correspondence: firstname.lastname@example.org: Time flies in the domain of DNNs used for language modeling, indeed: The day this paper was submitted, on August 13th, 2019, to internal review, NVIDIA published yet another, larger language model of the transformer used in this paper. The MegratronLM (apart from taking a bite out of the pun in this article's title) is currently the largest language model based on the transformer architecture . This latest neural network language model has 8 billions of parameters, which is incomprehensible compared to the type of neural networks we used only two decades ago. At that time, in winter semester 1999-2000, I taught classes about artificial Neural Networks (NNs, e.g. Back then, Artificial Intelligence (AI) already entered what was referred to as AI winter, as most network sizes were limited to rather small architectures unless supercomputers were employed.
Natural language understanding(NLU) is one of the richest areas in deep learning which includes highly diverse tasks such as reaching comprehension, question-answering or machine translation. Traditionally, NLU models focus on solving only of those tasks and are useless when applied to other NLU-domains. Also, NLU models have mostly evolved as supervised learning architectures that require expensive training exercises. Recently, researchers from OpenAI challenged both assumptions in a paper that introduces a single unsupervised NLU model that is able to achieve state-of-the-art performance in many NLU tasks. The idea of using unsupervised learning for different NLU tasks has been gaining traction in the last few months.