Large Language Model
Google fired engineer who said its AI was sentient
LaMDA utilizes Google's most advanced large language models, a type of AI that recognizes and generates text. These systems cannot understand language or meaning, researchers say. But they can produce deceptively humanlike speech because they are trained on massive amounts of data crawled from the internet to predict the next most likely word in a sentence.
Google fires researcher who claimed LaMDA AI was sentient
Blake Lemoine, an engineer who's spent the last seven years with Google, has been fired, reports Alex Kantrowitz of the Big Technology newsletter. The news was allegedly broken by Lemoine himself during a taping of the podcast of the same name, though the episode is not yet public. Google confirmed the firing to Engadget. Lemoine, who most recently was part of Google's Responsible AI project, went to the Washington Post last month with claims that one of company's AI projects had allegedly gained sentience. The AI in question, LaMDA -- short for Language Model for Dialogue Applications -- was publicly unveiled by Google last year as a means for computers to better mimic open-ended conversation.
Robots Enact Malignant Stereotypes
Hundt, Andrew, Agnew, William, Zeng, Vicky, Kacianka, Severin, Gombolay, Matthew
Stereotypes, bias, and discrimination have been extensively documented in Machine Learning (ML) methods such as Computer Vision (CV) [18, 80], Natural Language Processing (NLP) [6], or both, in the case of large image and caption models such as OpenAI CLIP [14]. In this paper, we evaluate how ML bias manifests in robots that physically and autonomously act within the world. We audit one of several recently published CLIP-powered robotic manipulation methods, presenting it with objects that have pictures of human faces on the surface which vary across race and gender, alongside task descriptions that contain terms associated with common stereotypes. Our experiments definitively show robots acting out toxic stereotypes with respect to gender, race, and scientifically-discredited physiognomy, at scale. Furthermore, the audited methods are less likely to recognize Women and People of Color. Our interdisciplinary sociotechnical analysis synthesizes across fields and applications such as Science Technology and Society (STS), Critical Studies, History, Safety, Robotics, and AI. We find that robots powered by large datasets and Dissolution Models (sometimes called "foundation models", e.g. CLIP) that contain humans risk physically amplifying malignant stereotypes in general; and that merely correcting disparities will be insufficient for the complexity and scale of the problem. Instead, we recommend that robot learning methods that physically manifest stereotypes or other harmful outcomes be paused, reworked, or even wound down when appropriate, until outcomes can be proven safe, effective, and just. Finally, we discuss comprehensive policy changes and the potential of new interdisciplinary research on topics like Identity Safety Assessment Frameworks and Design Justice to better understand and address these harms.
NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework
Yao, Xingcheng, Zheng, Yanan, Yang, Xiaocong, Yang, Zhilin
Pretrained language models have become the standard approach for many NLP tasks due to strong performance, but they are very expensive to train. We propose a simple and efficient learning framework, TLM, that does not rely on large-scale pretraining. Given some labeled task data and a large general corpus, TLM uses task data as queries to retrieve a tiny subset of the general corpus and jointly optimizes the task objective and the language modeling objective from scratch. On eight classification datasets in four domains, TLM achieves results better than or similar to pretrained language models (e.g., RoBERTa-Large) while reducing the training FLOPs by two orders of magnitude. With high accuracy and efficiency, we hope TLM will contribute to democratizing NLP and expediting its development.
PanGu-Coder: Program Synthesis with Function-Level Language Modeling
Christopoulou, Fenia, Lampouras, Gerasimos, Gritta, Milan, Zhang, Guchun, Guo, Yinpeng, Li, Zhongqi, Zhang, Qi, Xiao, Meng, Shen, Bo, Li, Lin, Yu, Hao, Yan, Li, Zhou, Pingyi, Wang, Xin, Ma, Yuchi, Iacobacci, Ignacio, Wang, Yasheng, Liang, Guangtai, Wei, Jiansheng, Jiang, Xin, Wang, Qianxiang, Liu, Qun
We present PanGu-Coder, a pretrained decoder-only language model adopting the PanGu-Alpha architecture for text-to-code generation, i.e. the synthesis of programming language solutions given a natural language problem description. We train PanGu-Coder using a two-stage strategy: the first stage employs Causal Language Modelling (CLM) to pre-train on raw programming language data, while the second stage uses a combination of Causal Language Modelling and Masked Language Modelling (MLM) training objectives that focus on the downstream task of text-to-code generation and train on loosely curated pairs of natural language program definitions and code functions. Finally, we discuss PanGu-Coder-FT, which is fine-tuned on a combination of competitive programming problems and code with continuous integration tests. We evaluate PanGu-Coder with a focus on whether it generates functionally correct programs and demonstrate that it achieves equivalent or better performance than similarly sized models, such as CodeX, while attending a smaller context window and training on less data.
Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities
Shen, Zejiang, Lo, Kyle, Yu, Lauren, Dahlberg, Nathan, Schlanger, Margo, Downey, Doug
With the advent of large language models, methods for abstractive summarization have made great strides, creating potential for use in applications to aid knowledge workers processing unwieldy document collections. One such setting is the Civil Rights Litigation Clearinghouse (CRLC) (https://clearinghouse.net),which posts information about large-scale civil rights lawsuits, serving lawyers, scholars, and the general public. Today, summarization in the CRLC requires extensive training of lawyers and law students who spend hours per case understanding multiple relevant documents in order to produce high-quality summaries of key events and outcomes. Motivated by this ongoing real-world summarization effort, we introduce Multi-LexSum, a collection of 9,280 expert-authored summaries drawn from ongoing CRLC writing. Multi-LexSum presents a challenging multi-document summarization task given the length of the source documents, often exceeding two hundred pages per case. Furthermore, Multi-LexSum is distinct from other datasets in its multiple target summaries, each at a different granularity (ranging from one-sentence "extreme" summaries to multi-paragraph narrations of over five hundred words). We present extensive analysis demonstrating that despite the high-quality summaries in the training data (adhering to strict content and style guidelines), state-of-the-art summarization models perform poorly on this task. We release Multi-LexSum for further research in summarization methods as well as to facilitate development of applications to assist in the CRLC's mission at https://multilexsum.github.io.
Scientist makes AI write academic paper about itself
With minimal external inputs, OpenAI's GPT-3 text generating algorithm has authored an academic paper about itself, resulting in a study that is being peer-reviewed. When swedish researcher Almira Osmanovic Thunstrom commanded the text generator to write an academic thesis in 500 words about GPT-3, she "stood in awe" as the AI algorithm wrote a paper within two hours, complete with appropriate citations and contexts in places, she said in Scientific American. "As it started to generate text, I stood in awe. Here was novel content written in academic language, with well-grounded references cited in the right places and in relation to the right context," Dr Thunstrom noted.
Leveraging Natural Supervision for Language Representation Learning and Generation
Recent breakthroughs in Natural Language Processing (NLP) have been driven by language models trained on a massive amount of plain text. While powerful, deriving supervision from textual resources is still an open question. For example, language model pretraining often neglects the rich, freely-available structures in textual data. In this thesis, we describe three lines of work that seek to improve the training and evaluation of neural models using naturally-occurring supervision. We first investigate self-supervised training losses to help enhance the performance of pretrained language models for various NLP tasks. Specifically, we alter the sentence prediction loss to make it better suited to other pretraining losses and more challenging to solve. We design an intermediate finetuning step that uses self-supervised training to promote models' ability in cross-task generalization. Then we describe methods to leverage the structures in Wikipedia and paraphrases. In particular, we propose training losses to exploit hyperlinks, article structures, and article category graphs for entity-, discourse-, entailment-related knowledge. We propose a framework that uses paraphrase pairs to disentangle semantics and syntax in sentence representations. We extend the framework for a novel generation task that controls the syntax of output text with a sentential exemplar. Lastly, we discuss our work on tailoring textual resources for establishing challenging evaluation tasks. We introduce three datasets by defining novel tasks using various fan-contributed websites, including a long-form data-to-text generation dataset, a screenplay summarization dataset, and a long-form story generation dataset. These datasets have unique characteristics offering challenges to future work in their respective task settings.
An Explanation of In-context Learning as Implicit Bayesian Inference
Xie, Sang Michael, Raghunathan, Aditi, Liang, Percy, Ma, Tengyu
Large language models (LMs) such as GPT-3 have the surprising ability to do in-context learning, where the model learns to do a downstream task simply by conditioning on a prompt consisting of input-output examples. The LM learns from these examples without being explicitly pretrained to learn. Thus, it is unclear what enables in-context learning. In this paper, we study how in-context learning can emerge when pretraining documents have long-range coherence. Here, the LM must infer a latent document-level concept to generate coherent next tokens during pretraining. At test time, in-context learning occurs when the LM also infers a shared latent concept between examples in a prompt. We prove when this occurs despite a distribution mismatch between prompts and pretraining data in a setting where the pretraining distribution is a mixture of HMMs. In contrast to messy large-scale datasets used to train LMs capable of in-context learning, we generate a small-scale synthetic dataset (GINC) where Transformers and LSTMs both exhibit in-context learning. Beyond the theory, experiments on GINC exhibit large-scale real-world phenomena including improved in-context performance with model scaling (despite the same pretraining loss), sensitivity to example order, and instances where zero-shot is better than few-shot in-context learning.