Large Language Model
Artificial Intelligence and Cybersecurity. What new threats should we prepare for?
OpenAI is an AI research and deployment company whose mission is to ensure that artificial general intelligence benefits all of humanity. In July OpenAI released the GPT-3, a new language model trained with 175 billion parameters, 10x more than any previous non-sparse language model, capable of programing, designing and even talking about politics or economy. Here there is a Twitter thread with some of the most curious cases. Even if there was a huge hype, the CEO of OpenAI and former president of Y Combinator, Sam Altman literally said "The GPT-3 hype is way too much. It is impressive but it still has serious weaknesses and sometimes makes very silly mistakes".
Multi-label Few/Zero-shot Learning with Knowledge Aggregated from Multiple Label Graphs
Lu, Jueqing, Du, Lan, Liu, Ming, Dipnall, Joanna
Few/Zero-shot learning is a big challenge of many classifications tasks, where a classifier is required to recognise instances of classes that have very few or even no training samples. It becomes more difficult in multi-label classification, where each instance is labelled with more than one class. In this paper, we present a simple multi-graph aggregation model that fuses knowledge from multiple label graphs encoding different semantic label relationships in order to study how the aggregated knowledge can benefit multi-label zero/few-shot document classification. The model utilises three kinds of semantic information, i.e., the pre-trained word embeddings, label description, and pre-defined label relations. Experimental results derived on two large clinical datasets (i.e., MIMIC-II and MIMIC-III) and the EU legislation dataset show that methods equipped with the multi-graph knowledge aggregation achieve significant performance improvement across almost all the measures on few/zero-shot labels.
Zero-shot Entity Linking with Efficient Long Range Sequence Modeling
Yao, Zonghai, Cao, Liangliang, Pan, Huapu
This paper considers the problem of zero-shot entity linking, in which a link in the test time may not present in training. Following the prevailing BERTbased research efforts, we find a simple yet effective way is to expand the long-range sequence modeling. Unlike many previous methods, our method does not require expensive pre-training of BERT with long position embeddings. Instead, we propose an efficient position embeddings initialization method called Embedding-repeat, which initializes larger position embeddings based on BERT-Base. On Wikia's zero-shot EL dataset, our method improves the SOTA from 76.06% to 79.08%, and for its long Figure 1: Only models with large ERLength can solve data, the corresponding improvement is from this entity linking problem because only they can get 74.57% to 82.14%. Our experiments suggest valuable critical information in the mention contexts the effectiveness of long-range sequence modeling and entity description.
Load What You Need: Smaller Versions of Multilingual BERT
Abdaoui, Amine, Pradel, Camille, Sigel, Grรฉgoire
Pre-trained Transformer-based models are achieving state-of-the-art results on a variety of Natural Language Processing data sets. However, the size of these models is often a drawback for their deployment in real production applications. In the case of multilingual models, most of the parameters are located in the embeddings layer. Therefore, reducing the vocabulary size should have an important impact on the total number of parameters. In this paper, we propose to generate smaller models that handle fewer number of languages according to the targeted corpora. We present an evaluation of smaller versions of multilingual BERT on the XNLI data set, but we believe that this method may be applied to other multilingual transformers. The obtained results confirm that we can generate smaller models that keep comparable results, while reducing up to 45% of the total number of parameters. We compared our models with DistilmBERT (a distilled version of multilingual BERT) and showed that unlike language reduction, distillation induced a 1.7% to 6% drop in the overall accuracy on the XNLI data set. The presented models and code are publicly available.
KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation
Chen, Wenhu, Su, Yu, Yan, Xifeng, Wang, William Yang
Data-to-text generation has recently attracted substantial interests due to its wide applications. Existing methods have shown impressive performance on an array of tasks. However, they rely on a significant amount of labeled data for each task, which is costly to acquire and thus limits their application to new tasks and domains. In this paper, we propose to leverage pre-training and transfer learning to address this issue. We propose a knowledge-grounded pre-training (KGPT), which consists of two parts, 1) a general knowledge-grounded generation model to generate knowledge-enriched text. 2) a pre-training paradigm on a massive knowledge-grounded text corpus crawled from the web. The pre-trained model can be fine-tuned on various data-to-text generation tasks to generate task-specific text. We adopt three settings, namely fully-supervised, zero-shot, few-shot to evaluate its effectiveness. Under the fully-supervised setting, our model can achieve remarkable gains over the known baselines. Under zero-shot setting, our model without seeing any examples achieves over 30 ROUGE-L on WebNLG while all other baselines fail. Under the few-shot setting, our model only needs about one-fifteenth as many labeled examples to achieve the same level of performance as baseline models. These experiments consistently prove the strong generalization ability of our proposed framework https://github.com/wenhuchen/KGPT.
This Week's Awesome Tech Stories From Around the Web (Through October 10)
GPT-3 Bot Spends a Week Replying on Reddit, Starts Talking About the Illuminati Rhett Jones Gizmodo "..the length of the replies was especially unusual in that they were sometimes coming within a minute of the question first being asked. After an impressive run, the user was revealed to be a bot using OpenAI's remarkable language model GPT-3. The Quantum Internet Will Blow Your Mind. Here's What It Will Look Like Dan Hurley Discover "Fifty or so miles east of New York City, on the campus of Brookhaven National Laboratory, Eden Figueroa is one of the world's pioneering gardeners planting the seeds of a quantum internet. Capable of sending enormous amounts of data over vast distances, it would work not just faster than the current internet but faster than the speed of light--instantaneously, in fact, like the teleportation of Mr. Spock and Captain Kirk in Star Trek." This Robot Fry Chef on Rails Can Be Yours for $30,000 James Vincent The Verge "Like Flippy before it, Flippy ROAR is designed to automate simple food prep, specifically anything involving fryers and grills.
This AI lyrics generator strings your random words into songs
Songwriter's block can be a problem for even the world's most successful musicians. They can sometimes overcome it by taking breaks, seeking new forms of inspiration, or simply pushing through. And if none of that works, they could try out a new AI lyrics generator called keyword2lyrics. Sometimes I have a few ideas that I want to turn into a song, but I'm too lazy for that, so I thought it would be cool to make a program that generates lyrics from isolated keywords or phrases. Gatthi developed the tool by training OpenAI's GPT-2 language model on songs that Google lists when you search for "top artists 20th century" and "top artists 21st century," and extracted keywords from them using a tool called yake.
Someone let a GPT-3 bot loose on Reddit -- it didn't end well
A GPT-3-powered bot has been caught posing as a human on Reddit after more than a week of rampant posting on one of the site's most popular subreddits. Under the username of thegentlemetre, the bot had been churning out a post per minute on /r/AskReddit, a sub with more than 30 million users. That behavior raised the suspicions of writer Philip Winston. "I read through some of the posts and they reminded me of text I'd seen from OpenAI's language model GPT-3," Winston wrote on his blog. Winston shared his theory on the subreddit /r/GPT3. Another Redditor named Wiskkey noticed that the structure of its writing was similar to that used by the Philosopher AI, a controversial text generator powered by GPT-3.
What Do Position Embeddings Learn? An Empirical Study of Pre-Trained Language Model Positional Encoding
In recent years, pre-trained Transformers have dominated the majority of NLP benchmark tasks. Many variants of pre-trained Transformers have kept breaking out, and most focus on designing different pre-training objectives or variants of self-attention. Embedding the position information in the self-attention mechanism is also an indispensable factor in Transformers however is often discussed at will. Therefore, this paper carries out an empirical study on position embeddings of mainstream pre-trained Transformers, which mainly focuses on two questions: 1) Do position embeddings really learn the meaning of positions? 2) How do these different learned position embeddings affect Transformers for NLP tasks? This paper focuses on providing a new insight of pre-trained position embeddings through feature-level analysis and empirical experiments on most of iconic NLP tasks. It is believed that our experimental results can guide the future work to choose the suitable positional encoding function for specific tasks given the application property.
GenAug: Data Augmentation for Finetuning Text Generators
Feng, Steven Y., Gangal, Varun, Kang, Dongyeop, Mitamura, Teruko, Hovy, Eduard
In this paper, we investigate data augmentation for text generation, which we call GenAug. Text generation and language modeling are important tasks within natural language processing, and are especially challenging for low-data regimes. We propose and evaluate various augmentation methods, including some that incorporate external knowledge, for finetuning GPT-2 on a subset of Yelp Reviews. We also examine the relationship between the amount of augmentation and the quality of the generated text. We utilize several metrics that evaluate important aspects of the generated text including its diversity and fluency. Our experiments demonstrate that insertion of character-level synthetic noise and keyword replacement with hypernyms are effective augmentation methods, and that the quality of generations improves to a peak at approximately three times the amount of original data.