AITopics | Large Language Model

Collaborating Authors

Large Language Model

News Overviews Instructional Materials AI-Alerts Classics

Large Language Models (LLM): Top 3 of the Most Important Methods

#artificialintelligenceJul-23-2022, 15:45:35 GMT

Large language models (LLM) are sophisticated statistical models of natural language applied across very specific implementations, such as machine translation, speech recognition, and text generation.

important method, language model, llm

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Google fired engineer who said its AI was sentient

Washington Post - Technology NewsJul-23-2022, 01:10:21 GMT

LaMDA utilizes Google's most advanced large language models, a type of AI that recognizes and generates text. These systems cannot understand language or meaning, researchers say. But they can produce deceptively humanlike speech because they are trained on massive amounts of data crawled from the internet to predict the next most likely word in a sentence.

google, large language model, natural language, (3 more...)

Washington Post - Technology News

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.41)

Add feedback

Google fires researcher who claimed LaMDA AI was sentient

EngadgetJul-23-2022, 00:17:46 GMT

Blake Lemoine, an engineer who's spent the last seven years with Google, has been fired, reports Alex Kantrowitz of the Big Technology newsletter. The news was allegedly broken by Lemoine himself during a taping of the podcast of the same name, though the episode is not yet public. Google confirmed the firing to Engadget. Lemoine, who most recently was part of Google's Responsible AI project, went to the Washington Post last month with claims that one of company's AI projects had allegedly gained sentience. The AI in question, LaMDA -- short for Language Model for Dialogue Applications -- was publicly unveiled by Google last year as a means for computers to better mimic open-ended conversation.

google, lemoine, sentient, (6 more...)

Engadget

Industry: Information Technology > Security & Privacy (0.81)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.40)

Add feedback

Robots Enact Malignant Stereotypes

Hundt, Andrew, Agnew, William, Zeng, Vicky, Kacianka, Severin, Gombolay, Matthew

arXiv.org Artificial IntelligenceJul-23-2022

Stereotypes, bias, and discrimination have been extensively documented in Machine Learning (ML) methods such as Computer Vision (CV) [18, 80], Natural Language Processing (NLP) [6], or both, in the case of large image and caption models such as OpenAI CLIP [14]. In this paper, we evaluate how ML bias manifests in robots that physically and autonomously act within the world. We audit one of several recently published CLIP-powered robotic manipulation methods, presenting it with objects that have pictures of human faces on the surface which vary across race and gender, alongside task descriptions that contain terms associated with common stereotypes. Our experiments definitively show robots acting out toxic stereotypes with respect to gender, race, and scientifically-discredited physiognomy, at scale. Furthermore, the audited methods are less likely to recognize Women and People of Color. Our interdisciplinary sociotechnical analysis synthesizes across fields and applications such as Science Technology and Society (STS), Critical Studies, History, Safety, Robotics, and AI. We find that robots powered by large datasets and Dissolution Models (sometimes called "foundation models", e.g. CLIP) that contain humans risk physically amplifying malignant stereotypes in general; and that merely correcting disparities will be insufficient for the complexity and scale of the problem. Instead, we recommend that robot learning methods that physically manifest stereotypes or other harmful outcomes be paused, reworked, or even wound down when appropriate, until outcomes can be proven safe, effective, and just. Finally, we discuss comprehensive policy changes and the potential of new interdisciplinary research on topics like Identity Safety Assessment Frameworks and Design Justice to better understand and address these harms.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3531146.3533138

2207.11569

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > South Korea > Seoul > Seoul (0.06)
North America > United States > New York > New York County > New York City (0.05)
(20 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law Enforcement & Public Safety (1.00)
Information Technology (1.00)
Health & Medicine (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.46)

Add feedback

NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework

Yao, Xingcheng, Zheng, Yanan, Yang, Xiaocong, Yang, Zhilin

arXiv.org Artificial IntelligenceJul-22-2022

Pretrained language models have become the standard approach for many NLP tasks due to strong performance, but they are very expensive to train. We propose a simple and efficient learning framework, TLM, that does not rely on large-scale pretraining. Given some labeled task data and a large general corpus, TLM uses task data as queries to retrieve a tiny subset of the general corpus and jointly optimizes the task objective and the language modeling objective from scratch. On eight classification datasets in four domains, TLM achieves results better than or similar to pretrained language models (e.g., RoBERTa-Large) while reducing the training FLOPs by two orders of magnitude. With high accuracy and efficiency, we hope TLM will contribute to democratizing NLP and expediting its development.

computational linguistic, corpus, tlm, (15 more...)

arXiv.org Artificial Intelligence

2111.0413

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > Maryland > Baltimore (0.04)
(6 more...)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment (0.67)
Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

PanGu-Coder: Program Synthesis with Function-Level Language Modeling

Christopoulou, Fenia, Lampouras, Gerasimos, Gritta, Milan, Zhang, Guchun, Guo, Yinpeng, Li, Zhongqi, Zhang, Qi, Xiao, Meng, Shen, Bo, Li, Lin, Yu, Hao, Yan, Li, Zhou, Pingyi, Wang, Xin, Ma, Yuchi, Iacobacci, Ignacio, Wang, Yasheng, Liang, Guangtai, Wei, Jiansheng, Jiang, Xin, Wang, Qianxiang, Liu, Qun

arXiv.org Artificial IntelligenceJul-22-2022

We present PanGu-Coder, a pretrained decoder-only language model adopting the PanGu-Alpha architecture for text-to-code generation, i.e. the synthesis of programming language solutions given a natural language problem description. We train PanGu-Coder using a two-stage strategy: the first stage employs Causal Language Modelling (CLM) to pre-train on raw programming language data, while the second stage uses a combination of Causal Language Modelling and Masked Language Modelling (MLM) training objectives that focus on the downstream task of text-to-code generation and train on loosely curated pairs of natural language program definitions and code functions. Finally, we discuss PanGu-Coder-FT, which is fine-tuned on a combination of competitive programming problems and code with continuous integration tests. We evaluate PanGu-Coder with a focus on whether it generates functionally correct programs and demonstrate that it achieves equivalent or better performance than similarly sized models, such as CodeX, while attending a smaller context window and training on less data.

computational linguistic, dataset, objective, (15 more...)

arXiv.org Artificial Intelligence

2207.1128

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Dominican Republic (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(6 more...)

Genre: Research Report (0.41)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities

Shen, Zejiang, Lo, Kyle, Yu, Lauren, Dahlberg, Nathan, Schlanger, Margo, Downey, Doug

arXiv.org Artificial IntelligenceJul-22-2022

With the advent of large language models, methods for abstractive summarization have made great strides, creating potential for use in applications to aid knowledge workers processing unwieldy document collections. One such setting is the Civil Rights Litigation Clearinghouse (CRLC) (https://clearinghouse.net),which posts information about large-scale civil rights lawsuits, serving lawyers, scholars, and the general public. Today, summarization in the CRLC requires extensive training of lawyers and law students who spend hours per case understanding multiple relevant documents in order to produce high-quality summaries of key events and outcomes. Motivated by this ongoing real-world summarization effort, we introduce Multi-LexSum, a collection of 9,280 expert-authored summaries drawn from ongoing CRLC writing. Multi-LexSum presents a challenging multi-document summarization task given the length of the source documents, often exceeding two hundred pages per case. Furthermore, Multi-LexSum is distinct from other datasets in its multiple target summaries, each at a different granularity (ranging from one-sentence "extreme" summaries to multi-paragraph narrations of over five hundred words). We present extensive analysis demonstrating that despite the high-quality summaries in the training data (adhering to strict content and style guidelines), state-of-the-art summarization models perform poorly on this task. We release Multi-LexSum for further research in summarization methods as well as to facilitate development of applications to assist in the CRLC's mission at https://multilexsum.github.io.

dataset, multi-lexsum, source document, (14 more...)

arXiv.org Artificial Intelligence

2206.10883

Country:

North America > United States > Michigan (0.05)
North America > United States > Ohio (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry:

Law > Litigation (1.00)
Law > Civil Rights & Constitutional Law (1.00)
Government > Regional Government > North America Government > United States Government (0.67)
Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Scientist makes AI write academic paper about itself

#artificialintelligenceJul-21-2022, 11:05:30 GMT

With minimal external inputs, OpenAI's GPT-3 text generating algorithm has authored an academic paper about itself, resulting in a study that is being peer-reviewed. When swedish researcher Almira Osmanovic Thunstrom commanded the text generator to write an academic thesis in 500 words about GPT-3, she "stood in awe" as the AI algorithm wrote a paper within two hours, complete with appropriate citations and contexts in places, she said in Scientific American. "As it started to generate text, I stood in awe. Here was novel content written in academic language, with well-grounded references cited in the right places and in relation to the right context," Dr Thunstrom noted.

make ai write academic paper, scientist make ai write, thunstrom, (1 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.33)

Add feedback

Leveraging Natural Supervision for Language Representation Learning and Generation

Chen, Mingda

arXiv.org Artificial IntelligenceJul-21-2022

Recent breakthroughs in Natural Language Processing (NLP) have been driven by language models trained on a massive amount of plain text. While powerful, deriving supervision from textual resources is still an open question. For example, language model pretraining often neglects the rich, freely-available structures in textual data. In this thesis, we describe three lines of work that seek to improve the training and evaluation of neural models using naturally-occurring supervision. We first investigate self-supervised training losses to help enhance the performance of pretrained language models for various NLP tasks. Specifically, we alter the sentence prediction loss to make it better suited to other pretraining losses and more challenging to solve. We design an intermediate finetuning step that uses self-supervised training to promote models' ability in cross-task generalization. Then we describe methods to leverage the structures in Wikipedia and paraphrases. In particular, we propose training losses to exploit hyperlinks, article structures, and article category graphs for entity-, discourse-, entailment-related knowledge. We propose a framework that uses paraphrase pairs to disentangle semantics and syntax in sentence representations. We extend the framework for a novel generation task that controls the syntax of output text with a sentential exemplar. Lastly, we discuss our work on tailoring textual resources for establishing challenging evaluation tasks. We introduce three datasets by defining novel tasks using various fan-contributed websites, including a long-form data-to-text generation dataset, a screenplay summarization dataset, and a long-form story generation dataset. These datasets have unique characteristics offering challenges to future work in their respective task settings.

large language model, machine learning, natural language, (24 more...)

arXiv.org Artificial Intelligence

2207.10617

Country:

Europe > France (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Pennsylvania > Dauphin County > Harrisburg (0.04)
(14 more...)

Genre:

Research Report > New Finding (1.00)
Overview (0.93)

Industry:

Media > Television (1.00)
Media > Film (1.00)
Law (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(4 more...)

Add feedback

An Explanation of In-context Learning as Implicit Bayesian Inference

Xie, Sang Michael, Raghunathan, Aditi, Liang, Percy, Ma, Tengyu

arXiv.org Artificial IntelligenceJul-21-2022

Large language models (LMs) such as GPT-3 have the surprising ability to do in-context learning, where the model learns to do a downstream task simply by conditioning on a prompt consisting of input-output examples. The LM learns from these examples without being explicitly pretrained to learn. Thus, it is unclear what enables in-context learning. In this paper, we study how in-context learning can emerge when pretraining documents have long-range coherence. Here, the LM must infer a latent document-level concept to generate coherent next tokens during pretraining. At test time, in-context learning occurs when the LM also infers a shared latent concept between examples in a prompt. We prove when this occurs despite a distribution mismatch between prompts and pretraining data in a setting where the pretraining distribution is a mixture of HMMs. In contrast to messy large-scale datasets used to train LMs capable of in-context learning, we generate a small-scale synthetic dataset (GINC) where Transformers and LSTMs both exhibit in-context learning. Beyond the theory, experiments on GINC exhibit large-scale real-world phenomena including improved in-context performance with model scaling (despite the same pretraining loss), sensitivity to example order, and instances where zero-shot is better than few-shot in-context learning.

in-context learning, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2111.0208

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback