AITopics | Large Language Model

Collaborating Authors

Large Language Model

News Overviews Instructional Materials AI-Alerts Classics

Clip-Tuning: Towards Derivative-free Prompt Learning with a Mixture of Rewards

Chai, Yekun, Wang, Shuohuan, Sun, Yu, Tian, Hao, Wu, Hua, Wang, Haifeng

arXiv.org Artificial IntelligenceOct-21-2022

Derivative-free prompt learning has emerged as a lightweight alternative to prompt tuning, which only requires model inference to optimize the prompts. However, existing work did not take full advantage of the over-parameterized characteristics of large pre-trained language models (PLMs). In this paper, we propose Clip-Tuning, a simple yet effective method that adopts diverse frozen "thinned" networks of PLMs to obtain a mixture of rewards and thus advance the derivative-free prompt learning. The thinned networks consist of all the hidden units that survive a stationary dropout strategy, whose inference predictions reflect an ensemble of partial views over prompted training samples. Our method outperforms previous gradient-free prompt learning methods and achieves parity with gradient-based counterparts on seven language understanding benchmarks under few-shot settings.

large language model, machine learning, subnetwork, (19 more...)

arXiv.org Artificial Intelligence

2210.1205

Country:

North America > United States > District of Columbia > Washington (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.46)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Add feedback

Late Prompt Tuning: A Late Prompt Could Be Better Than Many Prompts

Liu, Xiangyang, Sun, Tianxiang, Huang, Xuanjing, Qiu, Xipeng

arXiv.org Artificial IntelligenceOct-21-2022

Prompt tuning is a parameter-efficient tuning (PETuning) method for utilizing pre-trained models (PTMs) that simply prepends a soft prompt to the input and only optimizes the prompt to adapt PTMs to downstream tasks. Although it is parameter- and deployment-efficient, its performance still lags behind other state-of-the-art PETuning methods. Besides, the training cost of prompt tuning is not significantly reduced due to the back-propagation through the entire model. Through empirical analyses, we shed some light on the lagging performance of prompt tuning and recognize a trade-off between the propagation distance from label signals to the inserted prompt and the influence of the prompt on model outputs. Further, we present Late Prompt Tuning (LPT) that inserts a late prompt into an intermediate layer of the PTM instead of the input layer or all layers. The late prompt is obtained by a neural prompt generator conditioned on the hidden states before the prompt insertion layer and therefore is instance-dependent. Through extensive experimental results across various tasks and PTMs, we show that LPT can achieve competitive performance to full model tuning and other PETuning methods under both full-data and few-shot scenarios while possessing faster training speed and lower memory cost.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2210.11292

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.14)
(14 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Re3: Generating Longer Stories With Recursive Reprompting and Revision

Yang, Kevin, Tian, Yuandong, Peng, Nanyun, Klein, Dan

arXiv.org Artificial IntelligenceOct-21-2022

We consider the problem of automatically generating longer stories of over two thousand words. Compared to prior work on shorter stories, long-range plot coherence and relevance are more central challenges here. We propose the Recursive Reprompting and Revision framework (Re3) to address these challenges by (a) prompting a general-purpose language model to construct a structured overarching plan, and (b) generating story passages by repeatedly injecting contextual information from both the plan and current story state into a language model prompt. We then revise by (c) reranking different continuations for plot coherence and premise relevance, and finally (d) editing the best continuation for factual consistency. Compared to similar-length stories generated directly from the same base model, human evaluators judged substantially more of Re3's stories as having a coherent overarching plot (by 14% absolute increase), and relevant to the given initial premise (by 20%).

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2210.06774

Country:

North America > United States > Nevada > Clark County > Las Vegas (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > New York (0.04)
(8 more...)

Genre:

Research Report (1.00)
Personal > Interview (1.00)

Industry:

Transportation (1.00)
Leisure & Entertainment (1.00)
Law (1.00)
(9 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Microsoft in Advanced Talks to Increase Investment in OpenAI

WSJ.com: WSJD - TechnologyOct-20-2022, 21:34:00 GMT

Microsoft is in advanced talks for a new round of funding in OpenAI, according to a person familiar with the matter, as the software giant seeks to further incorporate artificial intelligence into its products. No deal has been reached between the two sides and the funding amount could vary as negotiations evolve, the person said. The companies have held talks in recent weeks, according to people familiar with the matter. Microsoft invested $1 billion in OpenAI in 2019. The new cash could help bankroll the tremendous computing power OpenAI needs to run its various artificial intelligence products on Azure, Microsoft's cloud computing service.

increase investment, microsoft, openai, (4 more...)

WSJ.com: WSJD - Technology

Industry: Information Technology > Software (0.38)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

How Open Source is eating AI

#artificialintelligenceOct-20-2022, 16:35:18 GMT

By August, it had been cloned in the open by two master's students as OpenGPT-2 By November, OpenAI released their 1.5B parameter model, after a cautious staged release process May 2020: OpenAI released GPT-3 as a paper and a closed beta API in June 2020. Mar 2021: EleutherAI released their open GPT-Neo 1.3B and 2.7B models May 2022: Meta released OPT-175B for researchers (with logbook! and an open license) The Text-to-Image cycle took 4? months: Apr 2022: OpenAI announces DALL-E 2 with a limited "research preview" The timelines above are highly cherrypicked of course; the story is much longer if you take into account the longer development history starting from the academic papers for diffusion (2015) and transformer models (2017) and older work on GANs. But what is more interesting is what has happened since: OpenAI's audio-to-text model, Whisper, was released under MIT license in September with no API paywall. Of course, there is less scope for abuse in the audio-to-text domain, but more than a few people have speculated that the reception to Stable Diffusion's release influenced the open sourcing decision. Sufficiently advanced community is indistinguishable from magic.

license, open source, stable diffusion, (14 more...)

#artificialintelligence

Country:

Europe > Italy (0.04)
Asia > Singapore (0.04)
Asia > India (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Generally Intelligent secures cash from OpenAI vets to build capable AI systems

#artificialintelligenceOct-20-2022, 16:35:09 GMT

A new AI research company is launching out of stealth today with an ambitious goal: to research the fundamentals of human intelligence that machines currently lack. Called Generally Intelligent, it plans to do this by turning these fundamentals into an array of tasks to be solved and by designing and testing different systems' ability to learn to solve them in highly complex 3D worlds built by their team. "We believe that generally intelligent computers will someday unlock extraordinary potential for human creativity and insight," CEO Kanjun Qiu told TechCrunch in an email interview. "However, today's AI models are missing several key elements of human intelligence, which inhibits the development of general-purpose AI systems that can be deployed safely … Generally Intelligent's work aims to understand the fundamentals of human intelligence in order to engineer safe AI systems that can learn and understand the way humans do." Qiu, the former chief of staff at Dropbox and the co-founder of Ember Hardware, which designed laser displays for VR headsets, co-founded Generally Intelligent in 2021 after shutting down her previous startup, Sourceress, a recruiting company that used AI to scour the web.

agent, ai system, intelligence, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Creativity & Intelligence (0.98)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.45)

Add feedback

Will Artificial Intelligence Ever Rival Human Thinking?

#artificialintelligenceOct-20-2022, 14:46:22 GMT

Some of the world's most advanced artificial intelligence (AI) systems, at least the ones the public hear about, are famous for beating human players at chess or poker. Other algorithms are known for their ability to learn how to recognize cats or their inability to recognize people with darker skin. But are current AI systems anything more than toys? Sure, their ability to play games or identify animals is impressive, but does this help toward creating useful AI systems? To answer this, we need to take a step back and question what the goals of AI are.

algorithm, intelligence, rival human thinking, (12 more...)

#artificialintelligence

Genre: Personal (0.36)

Industry: Leisure & Entertainment > Games (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.34)

Add feedback

Transformer-based Entity Typing in Knowledge Graphs

Hu, Zhiwei, Gutiérrez-Basulto, Víctor, Xiang, Zhiliang, Li, Ru, Pan, Jeff Z.

arXiv.org Artificial IntelligenceOct-20-2022

We investigate the knowledge graph entity typing task which aims at inferring plausible entity types. In this paper, we propose a novel Transformer-based Entity Typing (TET) approach, effectively encoding the content of neighbors of an entity. More precisely, TET is composed of three different mechanisms: a local transformer allowing to infer missing types of an entity by independently encoding the information provided by each of its neighbors; a global transformer aggregating the information of all neighbors of an entity into a single long sequence to reason about more complex entity types; and a context transformer integrating neighbors content based on their contribution to the type inference through information exchange between neighbor pairs. Furthermore, TET uses information about class membership of types to semantically strengthen the representation of an entity. Experiments on two real-world datasets demonstrate the superior performance of TET compared to the state-of-the-art.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2210.11151

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Dominican Republic (0.04)
(21 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)

Add feedback

Language Model Decomposition: Quantifying the Dependency and Correlation of Language Models

Zhang, Hao

arXiv.org Artificial IntelligenceOct-20-2022

Pre-trained language models (LMs), such as BERT (Devlin et al., 2018) and its variants, have led to significant improvements on various NLP tasks in past years. However, a theoretical framework for studying their relationships is still missing. In this paper, we fill this gap by investigating the linear dependency between pre-trained LMs. The linear dependency of LMs is defined analogously to the linear dependency of vectors. We propose Language Model Decomposition (LMD) to represent a LM using a linear combination of other LMs as basis, and derive the closed-form solution. A goodness-of-fit metric for LMD similar to the coefficient of determination is defined and used to measure the linear dependency of a set of LMs. In experiments, we find that BERT and eleven (11) BERT-like LMs are 91% linearly dependent. This observation suggests that current state-of-the-art (SOTA) LMs are highly "correlated". To further advance SOTA we need more diverse and novel LMs that are less dependent on existing LMs.

large language model, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2210.10289

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > New York (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.31)

Add feedback

ObSynth: An Interactive Synthesis System for Generating Object Models from Natural Language Specifications

Gu, Alex, Mitrovska, Tamara, Velez, Daniela, Andreas, Jacob, Solar-Lezama, Armando

arXiv.org Artificial IntelligenceOct-20-2022

We introduce ObSynth, an interactive system leveraging the domain knowledge embedded in large language models (LLMs) to help users design object models from high level natural language prompts. This is an example of specification reification, the process of taking a high-level, potentially vague specification and reifying it into a more concrete form. We evaluate ObSynth via a user study, leading to three key findings: first, object models designed using ObSynth are more detailed, showing that it often synthesizes fields users might have otherwise omitted. Second, a majority of objects, methods, and fields generated by ObSynth are kept by the user in the final object model, highlighting the quality of generated components. Third, ObSynth altered the workflow of participants: they focus on checking that synthesized components were correct rather than generating them from scratch, though ObSynth did not reduce the time participants took to generate object models.

large language model, natural language, obsynth, (18 more...)

arXiv.org Artificial Intelligence

2210.11468

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > Vietnam (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.04)

Genre: Research Report > Experimental Study (0.73)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback