AITopics

Industry:

Information Technology > Security & Privacy (0.40)
Government > Military > Cyberwarfare (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.82)

arXiv.org Artificial IntelligenceOct-14-2020

Multi-label Few/Zero-shot Learning with Knowledge Aggregated from Multiple Label Graphs

Lu, Jueqing, Du, Lan, Liu, Ming, Dipnall, Joanna

Few/Zero-shot learning is a big challenge of many classifications tasks, where a classifier is required to recognise instances of classes that have very few or even no training samples. It becomes more difficult in multi-label classification, where each instance is labelled with more than one class. In this paper, we present a simple multi-graph aggregation model that fuses knowledge from multiple label graphs encoding different semantic label relationships in order to study how the aggregated knowledge can benefit multi-label zero/few-shot document classification. The model utilises three kinds of semantic information, i.e., the pre-trained word embeddings, label description, and pre-defined label relations. Experimental results derived on two large clinical datasets (i.e., MIMIC-II and MIMIC-III) and the EU legislation dataset show that methods equipped with the multi-graph knowledge aggregation achieve significant performance improvement across almost all the measures on few/zero-shot labels.

large language model, machine learning, natural language, (20 more...)

2010.07459

Country:

Oceania > Australia (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.83)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.66)

Yao, Zonghai, Cao, Liangliang, Pan, Huapu

Zero-shot Entity Linking with Efficient Long Range Sequence Modeling

arXiv.org Artificial IntelligenceOct-12-2020

This paper considers the problem of zero-shot entity linking, in which a link in the test time may not present in training. Following the prevailing BERTbased research efforts, we find a simple yet effective way is to expand the long-range sequence modeling. Unlike many previous methods, our method does not require expensive pre-training of BERT with long position embeddings. Instead, we propose an efficient position embeddings initialization method called Embedding-repeat, which initializes larger position embeddings based on BERT-Base. On Wikia's zero-shot EL dataset, our method improves the SOTA from 76.06% to 79.08%, and for its long Figure 1: Only models with large ERLength can solve data, the corresponding improvement is from this entity linking problem because only they can get 74.57% to 82.14%. Our experiments suggest valuable critical information in the mention contexts the effectiveness of long-range sequence modeling and entity description.

large language model, machine learning, natural language, (17 more...)

2010.06065

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.05)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

Abdaoui, Amine, Pradel, Camille, Sigel, Grégoire

Load What You Need: Smaller Versions of Multilingual BERT

arXiv.org Artificial IntelligenceOct-12-2020

Pre-trained Transformer-based models are achieving state-of-the-art results on a variety of Natural Language Processing data sets. However, the size of these models is often a drawback for their deployment in real production applications. In the case of multilingual models, most of the parameters are located in the embeddings layer. Therefore, reducing the vocabulary size should have an important impact on the total number of parameters. In this paper, we propose to generate smaller models that handle fewer number of languages according to the targeted corpora. We present an evaluation of smaller versions of multilingual BERT on the XNLI data set, but we believe that this method may be applied to other multilingual transformers. The obtained results confirm that we can generate smaller models that keep comparable results, while reducing up to 45% of the total number of parameters. We compared our models with DistilmBERT (a distilled version of multilingual BERT) and showed that unlike language reduction, distillation induced a 1.7% to 6% drop in the overall accuracy on the XNLI data set. The presented models and code are publicly available.

large language model, machine learning, natural language, (19 more...)

2010.05609

Country: Europe > France > Occitanie > Haute-Garonne > Toulouse (0.05)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

arXiv.org Artificial IntelligenceOct-11-2020

KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation

Chen, Wenhu, Su, Yu, Yan, Xifeng, Wang, William Yang

Data-to-text generation has recently attracted substantial interests due to its wide applications. Existing methods have shown impressive performance on an array of tasks. However, they rely on a significant amount of labeled data for each task, which is costly to acquire and thus limits their application to new tasks and domains. In this paper, we propose to leverage pre-training and transfer learning to address this issue. We propose a knowledge-grounded pre-training (KGPT), which consists of two parts, 1) a general knowledge-grounded generation model to generate knowledge-enriched text. 2) a pre-training paradigm on a massive knowledge-grounded text corpus crawled from the web. The pre-trained model can be fine-tuned on various data-to-text generation tasks to generate task-specific text. We adopt three settings, namely fully-supervised, zero-shot, few-shot to evaluate its effectiveness. Under the fully-supervised setting, our model can achieve remarkable gains over the known baselines. Under zero-shot setting, our model without seeing any examples achieves over 30 ROUGE-L on WebNLG while all other baselines fail. Under the few-shot setting, our model only needs about one-fifteenth as many labeled examples to achieve the same level of performance as baseline models. These experiments consistently prove the strong generalization ability of our proposed framework https://github.com/wenhuchen/KGPT.

large language model, machine learning, natural language, (18 more...)

2010.02307

Country:

North America > United States > California > Santa Barbara County > Santa Barbara (0.14)
Asia > Middle East > Iran (0.05)
Europe > Germany (0.04)
(8 more...)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Sports > Basketball (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

#artificialintelligenceOct-10-2020, 14:50:37 GMT

This Week's Awesome Tech Stories From Around the Web (Through October 10)

GPT-3 Bot Spends a Week Replying on Reddit, Starts Talking About the Illuminati Rhett Jones Gizmodo "..the length of the replies was especially unusual in that they were sometimes coming within a minute of the question first being asked. After an impressive run, the user was revealed to be a bot using OpenAI's remarkable language model GPT-3. The Quantum Internet Will Blow Your Mind. Here's What It Will Look Like Dan Hurley Discover "Fifty or so miles east of New York City, on the campus of Brookhaven National Laboratory, Eden Figueroa is one of the world's pioneering gardeners planting the seeds of a quantum internet. Capable of sending enormous amounts of data over vast distances, it would work not just faster than the current internet but faster than the speed of light--instantaneously, in fact, like the teleportation of Mr. Spock and Captain Kirk in Star Trek." This Robot Fry Chef on Rails Can Be Yours for $30,000 James Vincent The Verge "Like Flippy before it, Flippy ROAR is designed to automate simple food prep, specifically anything involving fryers and grills.

large language model, machine learning, natural language, (8 more...)

Country:

North America > United States > New York (0.26)
North America > United States > California (0.06)
North America > Canada (0.06)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.79)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.79)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.79)

#artificialintelligenceOct-10-2020, 07:50:15 GMT

This AI lyrics generator strings your random words into songs

Songwriter's block can be a problem for even the world's most successful musicians. They can sometimes overcome it by taking breaks, seeking new forms of inspiration, or simply pushing through. And if none of that works, they could try out a new AI lyrics generator called keyword2lyrics. Sometimes I have a few ideas that I want to turn into a song, but I'm too lazy for that, so I thought it would be cool to make a program that generates lyrics from isolated keywords or phrases. Gatthi developed the tool by training OpenAI's GPT-2 language model on songs that Google lists when you search for "top artists 20th century" and "top artists 21st century," and extracted keywords from them using a tool called yake.

large language model, machine learning, natural language, (13 more...)

Country: South America > Argentina (0.06)

Industry:

Media > Music (0.37)
Leisure & Entertainment (0.37)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.57)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.57)

#artificialintelligenceOct-10-2020, 07:50:06 GMT

Someone let a GPT-3 bot loose on Reddit -- it didn't end well

A GPT-3-powered bot has been caught posing as a human on Reddit after more than a week of rampant posting on one of the site's most popular subreddits. Under the username of thegentlemetre, the bot had been churning out a post per minute on /r/AskReddit, a sub with more than 30 million users. That behavior raised the suspicions of writer Philip Winston. "I read through some of the posts and they reminded me of text I'd seen from OpenAI's language model GPT-3," Winston wrote on his blog. Winston shared his theory on the subreddit /r/GPT3. Another Redditor named Wiskkey noticed that the structure of its writing was similar to that used by the Philosopher AI, a controversial text generator powered by GPT-3.

large language model, machine learning, thegentlemetre, (13 more...)

Industry: Media > News (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Wang, Yu-An, Chen, Yun-Nung

What Do Position Embeddings Learn? An Empirical Study of Pre-Trained Language Model Positional Encoding

arXiv.org Artificial IntelligenceOct-10-2020

In recent years, pre-trained Transformers have dominated the majority of NLP benchmark tasks. Many variants of pre-trained Transformers have kept breaking out, and most focus on designing different pre-training objectives or variants of self-attention. Embedding the position information in the self-attention mechanism is also an indispensable factor in Transformers however is often discussed at will. Therefore, this paper carries out an empirical study on position embeddings of mainstream pre-trained Transformers, which mainly focuses on two questions: 1) Do position embeddings really learn the meaning of positions? 2) How do these different learned position embeddings affect Transformers for NLP tasks? This paper focuses on providing a new insight of pre-trained position embeddings through feature-level analysis and empirical experiments on most of iconic NLP tasks. It is believed that our experimental results can guide the future work to choose the suitable positional encoding function for specific tasks given the application property.

information, large language model, machine learning, (18 more...)

2010.04903

Country:

North America > United States > Michigan (0.04)
Asia > Taiwan > Taiwan Province > Taipei (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.73)

arXiv.org Artificial IntelligenceOct-10-2020

GenAug: Data Augmentation for Finetuning Text Generators

Feng, Steven Y., Gangal, Varun, Kang, Dongyeop, Mitamura, Teruko, Hovy, Eduard

In this paper, we investigate data augmentation for text generation, which we call GenAug. Text generation and language modeling are important tasks within natural language processing, and are especially challenging for low-data regimes. We propose and evaluate various augmentation methods, including some that incorporate external knowledge, for finetuning GPT-2 on a subset of Yelp Reviews. We also examine the relationship between the amount of augmentation and the quality of the generated text. We utilize several metrics that evaluate important aspects of the generated text including its diversity and fluency. Our experiments demonstrate that insertion of character-level synthetic noise and keyword replacement with hypernyms are effective augmentation methods, and that the quality of generations improves to a peak at approximately three times the amount of original data.

augmentation, computational linguistic, continuation, (16 more...)

2010.01794

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Hong Kong (0.04)
Oceania > Kiribati (0.04)
(10 more...)

Genre: Research Report (0.82)

Industry: Consumer Products & Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.72)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)