Goto

Collaborating Authors

Generation


The Warmup Guide to Hugging Face

#artificialintelligence

Since it was founded, the startup, Hugging Face, has created several open-source libraries for NLP-based tokenizers and transformers. One of their libraries, the Hugging Face transformers package, is an immensely popular Python library providing over 32 pre-trained models that are extraordinarily useful for a variety of natural language processing (NLP) tasks. It was created to enable general-purpose architectures such as BERT, GPT-2, XLNet, XLM, DistilBERT, and RoBERTa for Natural Language Understanding (NLU) and Natural Language Generation (NLG) to perform tasks like text classification, information extraction, and text generation. Tasks performed by this library include classification, information extraction, question answering, summarization, translation, and text generation in over 100 languages. Its ultimate goal is to make cutting-edge NLP easier to use for everyone.


Microsoft and Nvidia build largest ever AI to mimic human language

New Scientist

Microsoft and chip manufacturer Nvidia have created a vast artificial intelligence that can mimic human language more convincingly than ever before. But the cost and time involved in creating the neural network has called into question whether such AIs can continue to scale up. The new neural network, known as the Megatron-Turing Natural Language Generation (MT-NLG) has 530 billion parameters, more than tripling the scale of OpenAI's groundbreaking GPT-3 neural network that was considered the state of the art up until now.


AI Weekly: AI model training costs on the rise, highlighting need for new solutions

#artificialintelligence

This week, Microsoft and Nvidia announced that they trained what they claim is one of the largest and most capable AI language models to date: Megatron-Turing Natural Language Generation (MT-NLP). MT-NLP contains 530 billion parameters -- the parts of the model learned from historical data -- and achieves leading accuracy in a broad set of tasks, including reading comprehension and natural language inferences. But building it didn't come cheap. Experts peg the cost in the millions of dollars. Like other large AI systems, MT-NLP raises questions about the accessibility of cutting-edge research approaches in machine learning.


Microsoft and Nvidia team up to train one of the world's largest language models

#artificialintelligence

The Transform Technology Summits start October 13th with Low-Code/No Code: Enabling Enterprise Agility. Microsoft and Nvidia today announced that they trained what they claim is the largest and most capable AI-powered language model to date: Megatron-Turing Natural Language Generation (MT-NLP). The successor to the companies' Turing NLG 17B and Megatron-LM models, MT-NLP contains 530 billion parameters and achieves "unmatched" accuracy in a broad set of natural language tasks, Microsoft and Nvidia say -- including reading comprehension, commonsense reasoning, and natural language inferences. "The quality and results that we have obtained today are a big step forward in the journey towards unlocking the full promise of AI in natural language. The innovations of DeepSpeed and Megatron-LM will benefit existing and future AI model development and make large AI models cheaper and faster to train," Nvidia's senior director of product management and marketing for accelerated computing, Paresh Kharya, and group program manager for the Microsoft Turing team, Ali Alvi wrote in a blog post.


Nvidia and Microsoft's new model may trump GPT-3 in race to NLP supremacy

#artificialintelligence

Chipmaker Nvidia and Microsoft claim they have built the world's largest artificial intelligence (AI) powered language model to date. The model, called the Megatron-Turing Natural Language Generation (MT-NLP) is a successor to the two companies' earlier work, which gave rise to the Turing NLG 17B and Megatron-LM models. It contains 530 parameters, which the companies claim will bring "unmatched" accuracy when the AI is put to work on natural learning tasks. This includes reading, common sense reasoning, word sense disambiguation and natural language inferences. In comparison to MT-NLP, OpenAI's GPT-3 AI has only 175 billion parameters.


Microsoft and Nvidia create 105-layer, 530 billion parameter language model that needs 280 A100 GPUs, but it's still biased

ZDNet

Nvidia and Microsoft have teamed up to create the Megatron-Turing Natural Language Generation model, which the duo claims is the "most powerful monolithic transformer language model trained to date". The AI model has 105 layers, 530 billion parameters, and operates on chunky supercomputer hardware like Selene. By comparison, the vaunted GPT-3 has 175 billion parameters. "Each model replica spans 280 NVIDIA A100 GPUs, with 8-way tensor-slicing within a node, and 35-way pipeline parallelism across nodes," the pair said in a blog post. The model was trained on 15 datasets that contained 339 billion tokens, and was capable of showing how larger models need less training to operate well.


A Plug-and-Play Method for Controlled Text Generation

arXiv.org Artificial Intelligence

Large pre-trained language models have repeatedly shown their ability to produce fluent text. Yet even when starting from a prompt, generation can continue in many plausible directions. Current decoding methods with the goal of controlling generation, e.g., to ensure specific words are included, either require additional models or fine-tuning, or work poorly when the task at hand is semantically unconstrained, e.g., story generation. In this work, we present a plug-and-play decoding method for controlled language generation that is so simple and intuitive, it can be described in a single sentence: given a topic or keyword, we add a shift to the probability distribution over our vocabulary towards semantically similar words. We show how annealing this distribution can be used to impose hard constraints on language generation, something no other plug-and-play method is currently able to do with SOTA language generators. Despite the simplicity of this approach, we see it works incredibly well in practice: decoding from GPT-2 leads to diverse and fluent sentences while guaranteeing the appearance of given guide words. We perform two user studies, revealing that (1) our method outperforms competing methods in human evaluations; and (2) forcing the guide words to appear in the generated text has no impact on the fluency of the generated text.


Learning Natural Language Generation from Scratch

arXiv.org Machine Learning

Since the development of generic language models trained on massive unlabelled text corpora (Radford et al., 2019; Brown et al., 2020), state-of-the art language processing systems rely on sequential transfer learning (Ruder, 2019). The pretrained Language Model (LM) is fine-tuned on the downstream task using a standard supervised learning (SL) objective (Wu et al., 2019; Peters et al., 2019). Yet, such an approach suffers from several issues (Chen et al., 2020): (i) catastrophic forgetting when a model forgets previously learned knowledge and overfits to target domains, (ii) computational inefficiency from fine-tuning billionparameters networks, and (iii) the need of supervised datasets. Moreover, task-specific language models learned with SL suffer from well-studied text degeneration issues (Holtzman et al., 2019), such as the exposure bias (Bengio et al., 2015), language biases (Saleh et al., 2020; Jaques et al., 2020), or a lack of diversity (Li et al., 2015). On the other hand, text generation can be naturally framed as a sequential decision making problem, with the sequence of words seen as successive actions over a vocabulary. Thus, some researchers have recently focused on learning language models using instead Reinforcement Learning (RL) (Strub et al., 2017; Das et al., 2017; Narasimhan et al., 2015).


Types Of Artificial Intelligence Technologies - ONPASSIVE

#artificialintelligence

In 1955, the term "artificial intelligence" was developed to describe a new branch of computer science. As the market for AI technology grows in demand and flourishes, it swiftly and drastically alters numerous aspects of our everyday lives. Several start-ups and internet behemoths are vying for their purchase. We will cover the top 8 Artificial Intelligence Technologies that everyone should be aware of in this post. Let's understand what Artificial Intelligence is? Artificial intelligence allows a computer system to be taught and then apply what it has learned to new data.


DEGREE: A Data-Efficient Generative Event Extraction Model

arXiv.org Artificial Intelligence

Event extraction (EE) aims to identify structured events, including event triggers and their corresponding arguments, from unstructured text. Most of the existing works rely on a large number of labeled instances to train models, while the labeled data could be expensive to be obtained. In this work, we present a data-efficient event extraction method by formulating event extraction as a natural language generation problem. The formulation allows us to inject knowledge of label semantics, event structure, and output dependencies into the model. Given a passage and an event type, our model learns to summarize this passage into a templated sentence in a predefined structure. The template is event-type-specific, manually created, and contains event trigger and argument information. Lastly, a rule-based algorithm is used to derive the trigger and argument predictions from the generated sentence. Our method inherently enjoys the following benefits: (1) The pretraining of the generative language models help incorporate the semantics of the labels for generative EE. (2) The autoregressive generation process and our end-to-end design for extracting triggers and arguments force the model to capture the dependencies among the output triggers and their arguments. (3) The predefined templates form concrete yet flexible rules to hint the models about the valid patterns for each event type, reducing the models' burden to learn structures from the data. Empirical results show that our model achieves superior performance over strong baselines on EE tasks in the low data regime and achieves competitive results to the current state-of-the-art when more data becomes available.