Large Language Model
Exploring Length Generalization in Large Language Models
Anil, Cem, Wu, Yuhuai, Andreassen, Anders, Lewkowycz, Aitor, Misra, Vedant, Ramasesh, Vinay, Slone, Ambrose, Gur-Ari, Guy, Dyer, Ethan, Neyshabur, Behnam
The ability to extrapolate from short problem instances to longer ones is an important form of out-of-distribution generalization in reasoning tasks, and is crucial when learning from datasets where longer problem instances are rare. These include theorem proving, solving quantitative mathematics problems, and reading/- summarizing novels. In this paper, we run careful empirical studies exploring the length generalization capabilities of transformer-based language models. We first establish that naively finetuning transformers on length generalization tasks shows significant generalization deficiencies independent of model scale. We then show that combining pretrained large language models' in-context learning abilities with scratchpad prompting (asking the model to output solution steps before producing an answer) results in a dramatic improvement in length generalization. We run careful failure analyses on each of the learning modalities and identify common sources of mistakes that highlight opportunities in equipping language models with the ability to generalize to longer problems.
RankGen: Improving Text Generation with Large Ranking Models
Krishna, Kalpesh, Chang, Yapei, Wieting, John, Iyyer, Mohit
Given an input sequence (or prefix), modern language models often assign high probabilities to output sequences that are repetitive, incoherent, or irrelevant to the prefix; as such, model-generated text also contains such artifacts. To address these issues we present RankGen, a 1.2B parameter encoder model for English that scores model generations given a prefix. RankGen can be flexibly incorporated as a scoring function in beam search and used to decode from any pretrained language model. We train RankGen using large-scale contrastive learning to map a prefix close to the ground-truth sequence that follows it and far away from two types of negatives: (1) random sequences from the same document as the prefix, and (2) sequences generated from a large language model conditioned on the prefix. Experiments across four different language models (345M-11B parameters) and two domains show that RankGen significantly outperforms decoding algorithms like nucleus, top-k, and typical sampling, as well as contrastive decoding and search, on both automatic metrics (85.0 vs 77.3 MAUVE over nucleus) as well as human evaluations with English writers (74.5% human preference over nucleus sampling). Analysis reveals that RankGen outputs are more relevant to the prefix and improve continuity and coherence compared to baselines. We release our model checkpoints, code, and human preference data with explanations to facilitate future research.
Interesting Facts About Artificial Intelligence You May Not Know
There's no doubt that artificial intelligence is one of the most fascinating and rapidly-growing fields in technology today. But there are still many things about AI that remain unknown to the average person. In this blog post, we will explore 10 interesting facts about artificial intelligence that you may not know. Keep reading to learn more! The most powerful artificial intelligence-based text generator available today, OpenAI's GPT-2, can write entire paragraphs and is error-free, but it has difficulty establishing causal relationships.
Efficient Zero-shot Event Extraction with Context-Definition Alignment
Zhang, Hongming, Yao, Wenlin, Yu, Dong
Event extraction (EE) is the task of identifying interested event mentions from text. Conventional efforts mainly focus on the supervised setting. However, these supervised models cannot generalize to event types out of the pre-defined ontology. To fill this gap, many efforts have been devoted to the zero-shot EE problem. This paper follows the trend of modeling event-type semantics but moves one step further. We argue that using the static embedding of the event type name might not be enough because a single word could be ambiguous, and we need a sentence to define the type semantics accurately. To model the definition semantics, we use two separate transformer models to project the contextualized event mentions and corresponding definitions into the same embedding space and then minimize their embedding distance via contrastive learning. On top of that, we also propose a warming phase to help the model learn the minor difference between similar definitions. We name our approach Zero-shot Event extraction with Definition (ZED). Experiments on the MAVEN dataset show that our model significantly outperforms all previous zero-shot EE methods with fast inference speed due to the disjoint design. Further experiments also show that ZED can be easily applied to the few-shot setting when the annotation is available and consistently outperforms baseline supervised methods.
Transformers on Multilingual Clause-Level Morphology
Acikgoz, Emre Can, Chubakov, Tilek, Kural, Müge, Şahin, Gözde Gül, Yuret, Deniz
This paper describes our winning systems in MRL: The 1st Shared Task on Multilingual Clause-level Morphology (EMNLP 2022 Workshop) designed by KUIS AI NLP team. We present our work for all three parts of the shared task: inflection, reinflection, and analysis. We mainly explore transformers with two approaches: (i) training models from scratch in combination with data augmentation, and (ii) transfer learning with prefix-tuning at multilingual morphological tasks. Data augmentation significantly improves performance for most languages in the inflection and reinflection tasks. On the other hand, Prefix-tuning on a pre-trained mGPT model helps us to adapt analysis tasks in low-data and multilingual settings. While transformer architectures with data augmentation achieved the most promising results for inflection and reinflection tasks, prefix-tuning on mGPT received the highest results for the analysis task. Our systems received 1st place in all three tasks in MRL 2022.
Which AI image details are most impacted by the text you type?
Prompt engineering is a machine learning and natural language processing (NLP) concept. In prompt engineering, the task description is contained in the input, e.g., in the form of a text, instead of being given implicitly. Effective text prompts must be delivered in a specific format. Unfortunately, text-based prompts need to be better documented, and generative systems and artists do not always share prompts. How to study the practices of text-based generative art?
GPT-3: What is GPT-3 and what can it do for your business? - Kavita Ganesan, PhD
There's been a lot of talk about GPT-3 and generative AI in the news, social media, and probably from every AI practitioner or vendor whom you've been speaking with lately. Everyone is super excited about the future that such AI tools hold. But what exactly is this AI technology specifically and what does it mean for your business and your AI problems? GPT-3 is a large language model developed by Open AI. It's the successor of Open AI's older language model, GPT-2 which was much smaller in comparison.
Meet Spellbook the GPT-3 Generative AI Word Add-In For Contracts
In another example of the use of generative AI approaches in the legal sector, Toronto-based Rally has launched a GPT-3 based add-in for Word called Spellbook, which is designed to help lawyers with legal drafting. Spellbook's use of OpenAI's GPT-3 large language model, an AI trained on 45 terabytes of data from books and the internet, is further'tuned' on legal datasets for'optimal contracting performance', they explained. Artificial Lawyer was understandably curious to know some more, especially after recently highlighting the work by PatentPal, which uses a non-GPT-3 generative AI model. This site asked Scott Stevenson, CEO of Rally – which provides its core legal management platform to 110 law firms – about how they are leveraging this technology. When did this start and what is Rally?
Top 50 NLP Interview Questions and Answers in 2023
Natural Language Processing helps machines understand and analyze natural languages. NLP is an automated process that helps extract the required information from data by applying machine learning algorithms. Learning NLP will help you land a high-paying job as it is used by various professionals such as data scientist professionals, machine learning engineers, etc. We have compiled a comprehensive list of NLP Interview Questions and Answers that will help you prepare for your upcoming interviews. You can also check out these free NLP courses to help with your preparation. Once you have prepared the following commonly asked questions, you can get into the job role you are looking for. Without further ado, let's kickstart your NLP learning journey.