Goto

Collaborating Authors

 Large Language Model


K-XLNet: A General Method for Combining Explicit Knowledge with Language Model Pretraining

arXiv.org Artificial Intelligence

Though pre-trained language models such as Bert and XLNet, have rapidly advanced the state-of-the-art on many NLP tasks, they implicit semantics only relying on surface information between words in corpus. Intuitively, background knowledge influences the efficacy of understanding. Inspired by this common sense, we focus on improving model pretraining by leveraging explicit knowledge. Different from recent research that optimize pretraining model by knowledge masking strategies, we propose a simple but general method to combine explicit knowledge with pretraining. To be specific, we first match knowledge facts from knowledge graph (KG) and then add a knowledge injunction layer to transformer directly without changing its architecture. The present study seeks to find the direct impact of explicit knowledge on transformer per-training. We conduct experiments on various datasets for different downstream tasks. The experimental results show that solely by adding external knowledge to transformer can improve the learning performance on many NLP tasks.


Are Neural Language Models Good Plagiarists? A Benchmark for Neural Paraphrase Detection

arXiv.org Artificial Intelligence

The rise of language models such as BERT allows for high-quality text paraphrasing. This is a problem to academic integrity, as it is difficult to differentiate between original and machine-generated content. We propose a benchmark consisting of paraphrased articles using recent language models relying on the Transformer architecture. Our contribution fosters future research of paraphrase detection systems as it offers a large collection of aligned original and paraphrased documents, a study regarding its structure, classification experiments with state-of-the-art systems, and we make our findings publicly available.


Cooperative Learning of Zero-Shot Machine Reading Comprehension

arXiv.org Artificial Intelligence

Pretrained language models have significantly improved the performance of down-stream language understanding tasks, including extractive question answering, by providing high-quality contextualized word embeddings. However, learning question answering models still need large-scaled data annotation in specific domains. In this work, we propose a cooperative, self-play learning framework, REGEX, for question generation and answering. REGEX is built upon a masked answer extraction task with an interactive learning environment containing an answer entity REcognizer, a question Generator, and an answer EXtractor. Given a passage with a masked entity, the generator generates a question around the entity, and the extractor is trained to extract the masked entity with the generated question and raw texts. The framework allows the training of question generation and answering models on any text corpora without annotation. We further leverage a reinforcement learning technique to reward generating high-quality questions and to improve the answer extraction model's performance. Experiment results show that REGEX outperforms the state-of-the-art (SOTA) pretrained language models and zero-shot approaches on standard question-answering benchmarks, and yields the new SOTA performance under the zero-shot setting.


Okay, the GPT-3 hype seems pretty reasonable โ€“ TechCrunch

#artificialintelligence

This morning TechCrunch covered an interesting round for Copy.ai, a startup that employs GPT-3 to help other companies with their writing projects. GPT-3, or Generative Pre-trained Transformer 3, is a piece of AI from the OpenAI group that takes text from the user, and writes a lot more for them. As part of the process of covering the Copy.ai I've long been more curious than afraid of automated writing. So when the Copy team described their very positive impressions of the GPT-3 AI writing tool to TechCrunch during an interview, I was intrigued.


Extra Crunch roundup: Coupang and Roblox debut, driving GPT-3 adoption, startup how-tos, more โ€“ TechCrunch

#artificialintelligence

Extra Crunch publishes a variety of article types, but how-tos are my favorite category. For many entrepreneurs, the startup they are trying to get off the ground might be only the second entry on their resume. As a result, they don't have much experience to draw from when it comes to basics like hiring, fundraising and growth marketing. Last week, Natasha Mascarenhas interviewed experts who had some strategic advice for finding the right time to bring a product manager on board. This afternoon, we published a guest post by growth marketer Jessica Li with tips for "how nontechnical talent can build relationships with deep tech companies."


OpenAI's Sam Altman: Artificial Intelligence will generate enough wealth to pay each adult $13,500 a year

#artificialintelligence

Artificial intelligence will create so much wealth that every adult in the United States could be paid $13,500 per year from its windfall as soon as 10 years from now. So says Sam Altman, co-founder and president of San Francisco-headquartered, artificial intelligence-focused nonprofit OpenAI. "My work at OpenAI reminds me every day about the magnitude of the socioeconomic change that is coming sooner than most people believe," Altman, who posted Tuesday. "Software that can think and learn will do more and more of the work that people now do." Altman calls it an "AI revolution," and compares it in magnitude to the agricultural, industrial and computational technological revolutions.


Adventures with AI: Here's what happened when I ate a three course meal designed by artificial intelligence

#artificialintelligence

Welcome to Adventures with AI, a column exploring what happens when artificial intelligence takes control of everyday tasks. Eating out is one of my great pleasures; cooking is not. Unfortunately, since the onset of the COVID-19 pandemic, I've been doing a lot of the latter and almost none of the former. Preparing meals has become paricularly tedious during London's latest lockdown. So like an unhappy couple in a sexless marriage, I've been trying to spice things up in my domestic life.


The key to making AI green is quantum computing

#artificialintelligence

We've painted ourselves into another corner with artificial intelligence. We're finally starting to breakthrough the usefulness barrier but we're butting up against the limits of our our ability to responsibly meet our machines' massive energy requirements. At the current rate of growth, it appears we'll have to turn Earth into Coruscant if we want to keep spending unfathomable amounts of energy training systems such as GPT-3 . The problem: Simply put, AI takes too much time and energy to train. A layperson might imagine a bunch of code on a laptop screen when they think about AI development, but the truth is that many of the systems we use today were trained on massive GPU networks, supercomputers, or both.


All NLP Tasks Are Generation Tasks: A General Pretraining Framework

arXiv.org Artificial Intelligence

There have been various types of pretraining architectures including autoregressive models (e.g., GPT), autoencoding models (e.g., BERT), and encoder-decoder models (e.g., T5). On the other hand, NLP tasks are different in nature, with three main categories being classification, unconditional generation, and conditional generation. However, none of the pretraining frameworks performs the best for all tasks, which introduces inconvenience for model development and selection. We propose a novel pretraining framework GLM (General Language Model) to address this challenge. Compared to previous work, our architecture has three major benefits: (1) it performs well on classification, unconditional generation, and conditional generation tasks with one single pretrained model; (2) it outperforms BERT-like models on classification due to improved pretrain-finetune consistency; (3) it naturally handles variable-length blank filling which is crucial for many downstream tasks. Empirically, GLM substantially outperforms BERT on the SuperGLUE natural language understanding benchmark with the same amount of pre-training data. Moreover, GLM with 1.25x parameters of BERT-Large achieves the best performance in NLU, conditional and unconditional generation at the same time, which demonstrates its generalizability to different downstream tasks.


TRIC -- Transformer-based Relative Image Captioning

#artificialintelligence

This blog post describes the TRIC model -- an architecture for Relative Image Captioning task that was created as a part of my Master Thesis. All of them are described in my thesis in a pretty concise way so I highly recommend it -- you can find a link right below. But if you want to check them from another source it is also covered. To each of the topics listed above, I have attached a link to my personal favorite resource concerning this particular subject. Earlier this month I defended my master's thesis in Computer Science at the Warsaw University of Technology.