Large Language Model
Characterizing Attribution and Fluency Tradeoffs for Retrieval-Augmented Large Language Models
Aksitov, Renat, Chang, Chung-Ching, Reitter, David, Shakeri, Siamak, Sung, Yunhsuan
Despite recent progress, it has been difficult to prevent semantic hallucinations in generative Large Language Models. One common solution to this is augmenting LLMs with a retrieval system and making sure that the generated output is attributable to the retrieved information. Given this new added constraint, it is plausible to expect that the overall quality of the output will be affected, for example, in terms of fluency. Can scaling language models help? Here we examine the relationship between fluency and attribution in LLMs prompted with retrieved evidence in knowledge-heavy dialog settings. Our experiments were implemented with a set of auto-metrics that are aligned with human preferences. They were used to evaluate a large set of generations, produced under varying parameters of LLMs and supplied context. We show that larger models tend to do much better in both fluency and attribution, and that (naively) using top-k retrieval versus top-1 retrieval improves attribution but hurts fluency. We next propose a recipe that could allow smaller models to both close the gap with larger models and preserve the benefits of top-k retrieval while avoiding its drawbacks.
How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval
Lin, Sheng-Chieh, Asai, Akari, Li, Minghan, Oguz, Barlas, Lin, Jimmy, Mehdad, Yashar, Yih, Wen-tau, Chen, Xilun
Various techniques have been developed in recent years to improve dense retrieval (DR), such as unsupervised contrastive learning and pseudo-query generation. Existing DRs, however, often suffer from effectiveness tradeoffs between supervised and zero-shot retrieval, which some argue was due to the limited model capacity. We contradict this hypothesis and show that a generalizable DR can be trained to achieve high accuracy in both supervised and zero-shot retrieval without increasing model size. In particular, we systematically examine the contrastive learning of DRs, under the framework of Data Augmentation (DA). Our study shows that common DA practices such as query augmentation with generative models and pseudo-relevance label creation using a cross-encoder, are often inefficient and sub-optimal. We hence propose a new DA approach with diverse queries and sources of supervision to progressively train a generalizable DR. As a result, DRAGON, our dense retriever trained with diverse augmentation, is the first BERT-base-sized DR to achieve state-of-the-art effectiveness in both supervised and zero-shot evaluations and even competes with models using more complex late interaction (ColBERTv2 and SPLADE++).
A Brief Report on LawGPT 1.0: A Virtual Legal Assistant Based on GPT-3
LawGPT 1.0 is a virtual legal assistant built on the state-of-the-art language model GPT-3, fine-tuned for the legal domain. The system is designed to provide legal assistance to users in a conversational manner, helping them with tasks such as answering legal questions, generating legal documents, and providing legal advice. In this paper, we provide a brief overview of LawGPT 1.0, its architecture, and its performance on a set of legal benchmark tasks. Please note that the detailed information about the model is protected by a non-disclosure agreement (NDA) and cannot be disclosed in this report.
Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models
Prabhumoye, Shrimai, Patwary, Mostofa, Shoeybi, Mohammad, Catanzaro, Bryan
Pretrained large language models have become indispensable for solving various natural language processing (NLP) tasks. However, safely deploying them in real world applications is challenging because they generate toxic content. To address this challenge, we propose two novel pretraining data augmentation strategies that significantly reduce model toxicity without compromising its utility. Our two strategies are: (1) MEDA: adds raw toxicity score as meta-data to the pretraining samples, and (2) INST: adds instructions to those samples indicating their toxicity. Our results indicate that our best performing strategy (INST) substantially reduces the toxicity probability up to 61% while preserving the accuracy Figure 1: Overview of the proposed approaches and the on five benchmark NLP tasks as well as baseline (BASE). We propose two new data augmentation improving AUC scores on four bias detection strategies, MEDA and INST. The text in purple are tasks by 1.3%. We also demonstrate the generalizability control variables indicating the desired toxicity level of of our techniques by scaling the the text. The text in black is the input to the model number of training samples and the number of and the text in green is the generated output using each model parameters.
Conversational AI-Powered Design: ChatGPT as Designer, User, and Product
The recent advancements in Large Language Models (LLMs), particularly conversational LLMs like ChatGPT, have prompted changes in a range of fields, including design. This study aims to examine the capabilities of ChatGPT in a human-centered design process. To this end, a hypothetical design project was conducted, where ChatGPT was utilized to generate personas, simulate interviews with fictional users, create new design ideas, simulate usage scenarios and conversations between an imaginary prototype and fictional users, and lastly evaluate user experience. The results show that ChatGPT effectively performed the tasks assigned to it as a designer, user, or product, providing mostly appropriate responses. The study does, however, highlight some drawbacks such as forgotten information, partial responses, and a lack of output diversity. The paper explains the potential benefits and limitations of using conversational LLMs in design, discusses its implications, and suggests directions for future research in this rapidly evolving area.
Parameter-Efficient Tuning with Special Token Adaptation
Yang, Xiaocong, Huang, James Y., Zhou, Wenxuan, Chen, Muhao
Parameter-efficient tuning aims at updating only a small subset of parameters when adapting a pretrained model to downstream tasks. In this work, we introduce PASTA, in which we only modify the special token representations (e.g., [SEP] and [CLS] in BERT) before the self-attention module at each layer in Transformer-based models. PASTA achieves comparable performance to full finetuning in natural language understanding tasks including text classification and NER with up to only 0.029% of total parameters trained. Our work not only provides a simple yet effective way of parameter-efficient tuning, which has a wide range of practical applications when deploying finetuned models for multiple tasks, but also demonstrates the pivotal role of special tokens in pretrained language models
Contrastive Search Is What You Need For Neural Text Generation
Generating text with autoregressive language models (LMs) is of great importance to many natural language processing (NLP) applications. Previous solutions for this task often produce text that contains degenerative expressions or lacks semantic consistency. Recently, Su et al. introduced a new decoding method, contrastive search, based on the isotropic representation space of the language model and obtained new state of the art on various benchmarks. Additionally, Su et al. argued that the representations of autoregressive LMs (e.g. GPT-2) are intrinsically anisotropic which is also shared by previous studies. Therefore, to ensure the language model follows an isotropic distribution, Su et al. proposed a contrastive learning scheme, SimCTG, which calibrates the language model's representations through additional training. In this study, we first answer the question: "Are autoregressive LMs really anisotropic?". To this end, we extensively evaluate the isotropy of LMs across 16 major languages. Surprisingly, we find that the anisotropic problem only exists in the two specific English GPT-2-small/medium models. On the other hand, all other evaluated LMs are naturally isotropic which is in contrast to the conclusion drawn by previous studies. Based on our findings, we further assess the contrastive search decoding method using off-the-shelf LMs on four generation tasks across 16 languages. Our experimental results demonstrate that contrastive search significantly outperforms previous decoding methods without any additional training. More notably, on 12 out of the 16 evaluated languages, contrastive search performs comparably with human-level performances as judged by human evaluations. Our code and other related resources are publicly available at https://github.com/yxuansu/Contrastive_Search_Is_What_You_Need.
Writer deploys home-cooked large language models to power up enterprise copy • TechCrunch
There's a lot of noise right now about how generative AIs like ChatGPT and Bard are going to revolutionize various aspects of the web, but companies targeting narrower verticals are already experiencing success. Writer is such a one, and it just announced a new trio of large language models to power its enterprise copy assistant. The company lets customers fine-tune these models on their own content and style guides, from which point forward the AI can write, help write, or edit copy so that it meets internal standards. More than just catching typos and recommending the preferred word, Writer's new models can evaluate style and write content themselves, even doing a bit of fact-checking when they're done. But the real draw is that the whole thing can be done internally, from fine-tuning to hosting, at least when it comes to the smaller two of the Palmyra series of models.
ChatGPT Can Improve Education, not Threaten It
Rather than banning students from using labor-saving and time-saving AI writing tools, we should teach students to use them ethically and productively. Educators are worried about students turning to ChatGPT to help them complete assignments. One proposed solution is to make students write exam essays using pen and paper, without the use of any Internet-connected electronic devices. The University of California, Los Angeles is considering making it an honor code violation to use ChatGPT for taking an exam or writing a paper. That is the wrong approach.
Big Tech Hasn't Fixed AI's Misinformation Problem--Yet
The scrappy underdog AI firm OpenAI has stirred the sleeping tech giants with its generative AI products, most recently and most prominently the conversational chatbot ChatGPT. Microsoft spent $10 billion on a partnership with OpenAI in an attempt to leap-frog its younger big tech competitors by weaving AI into many products; Google internally declared a "code red" and is cutting red tape to put out AI products more quickly, including a direct competitor to ChatGPT that was just announced; meanwhile Mark Zuckerberg has declared his intent to make Meta a "leader in generative AI," clearly a reaction to the attention OpenAI is garnering. The products these companies are suddenly striving for sound similar, but who will be the winner? Although much discussion has centered around the size of the AI models and how much data they are trained on, there's another factor that may matter a lot, too: the degree to which the contenders build trustworthy systems that don't unduly harm society and further destabilize democracy. OpenAI's earlier text generation product GPT-3 grabbed a lot of attention but never saw the widespread consumer adoption that ChatGPT has attained.