Goto

Collaborating Authors

 Large Language Model


Clip-Tuning: Towards Derivative-free Prompt Learning with a Mixture of Rewards

arXiv.org Artificial Intelligence

Derivative-free prompt learning has emerged as a lightweight alternative to prompt tuning, which only requires model inference to optimize the prompts. However, existing work did not take full advantage of the over-parameterized characteristics of large pre-trained language models (PLMs). In this paper, we propose Clip-Tuning, a simple yet effective method that adopts diverse frozen "thinned" networks of PLMs to obtain a mixture of rewards and thus advance the derivative-free prompt learning. The thinned networks consist of all the hidden units that survive a stationary dropout strategy, whose inference predictions reflect an ensemble of partial views over prompted training samples. Our method outperforms previous gradient-free prompt learning methods and achieves parity with gradient-based counterparts on seven language understanding benchmarks under few-shot settings.


Late Prompt Tuning: A Late Prompt Could Be Better Than Many Prompts

arXiv.org Artificial Intelligence

Prompt tuning is a parameter-efficient tuning (PETuning) method for utilizing pre-trained models (PTMs) that simply prepends a soft prompt to the input and only optimizes the prompt to adapt PTMs to downstream tasks. Although it is parameter- and deployment-efficient, its performance still lags behind other state-of-the-art PETuning methods. Besides, the training cost of prompt tuning is not significantly reduced due to the back-propagation through the entire model. Through empirical analyses, we shed some light on the lagging performance of prompt tuning and recognize a trade-off between the propagation distance from label signals to the inserted prompt and the influence of the prompt on model outputs. Further, we present Late Prompt Tuning (LPT) that inserts a late prompt into an intermediate layer of the PTM instead of the input layer or all layers. The late prompt is obtained by a neural prompt generator conditioned on the hidden states before the prompt insertion layer and therefore is instance-dependent. Through extensive experimental results across various tasks and PTMs, we show that LPT can achieve competitive performance to full model tuning and other PETuning methods under both full-data and few-shot scenarios while possessing faster training speed and lower memory cost.


Re3: Generating Longer Stories With Recursive Reprompting and Revision

arXiv.org Artificial Intelligence

We consider the problem of automatically generating longer stories of over two thousand words. Compared to prior work on shorter stories, long-range plot coherence and relevance are more central challenges here. We propose the Recursive Reprompting and Revision framework (Re3) to address these challenges by (a) prompting a general-purpose language model to construct a structured overarching plan, and (b) generating story passages by repeatedly injecting contextual information from both the plan and current story state into a language model prompt. We then revise by (c) reranking different continuations for plot coherence and premise relevance, and finally (d) editing the best continuation for factual consistency. Compared to similar-length stories generated directly from the same base model, human evaluators judged substantially more of Re3's stories as having a coherent overarching plot (by 14% absolute increase), and relevant to the given initial premise (by 20%).


Microsoft in Advanced Talks to Increase Investment in OpenAI

WSJ.com: WSJD - Technology

Microsoft is in advanced talks for a new round of funding in OpenAI, according to a person familiar with the matter, as the software giant seeks to further incorporate artificial intelligence into its products. No deal has been reached between the two sides and the funding amount could vary as negotiations evolve, the person said. The companies have held talks in recent weeks, according to people familiar with the matter. Microsoft invested $1 billion in OpenAI in 2019. The new cash could help bankroll the tremendous computing power OpenAI needs to run its various artificial intelligence products on Azure, Microsoft's cloud computing service.


How Open Source is eating AI

#artificialintelligence

By August, it had been cloned in the open by two master's students as OpenGPT-2 By November, OpenAI released their 1.5B parameter model, after a cautious staged release process May 2020: OpenAI released GPT-3 as a paper and a closed beta API in June 2020. Mar 2021: EleutherAI released their open GPT-Neo 1.3B and 2.7B models May 2022: Meta released OPT-175B for researchers (with logbook! and an open license) The Text-to-Image cycle took 4? months: Apr 2022: OpenAI announces DALL-E 2 with a limited "research preview" The timelines above are highly cherrypicked of course; the story is much longer if you take into account the longer development history starting from the academic papers for diffusion (2015) and transformer models (2017) and older work on GANs. But what is more interesting is what has happened since: OpenAI's audio-to-text model, Whisper, was released under MIT license in September with no API paywall. Of course, there is less scope for abuse in the audio-to-text domain, but more than a few people have speculated that the reception to Stable Diffusion's release influenced the open sourcing decision. Sufficiently advanced community is indistinguishable from magic.


Generally Intelligent secures cash from OpenAI vets to build capable AI systems

#artificialintelligence

A new AI research company is launching out of stealth today with an ambitious goal: to research the fundamentals of human intelligence that machines currently lack. Called Generally Intelligent, it plans to do this by turning these fundamentals into an array of tasks to be solved and by designing and testing different systems' ability to learn to solve them in highly complex 3D worlds built by their team. "We believe that generally intelligent computers will someday unlock extraordinary potential for human creativity and insight," CEO Kanjun Qiu told TechCrunch in an email interview. "However, today's AI models are missing several key elements of human intelligence, which inhibits the development of general-purpose AI systems that can be deployed safely … Generally Intelligent's work aims to understand the fundamentals of human intelligence in order to engineer safe AI systems that can learn and understand the way humans do." Qiu, the former chief of staff at Dropbox and the co-founder of Ember Hardware, which designed laser displays for VR headsets, co-founded Generally Intelligent in 2021 after shutting down her previous startup, Sourceress, a recruiting company that used AI to scour the web.


Will Artificial Intelligence Ever Rival Human Thinking?

#artificialintelligence

Some of the world's most advanced artificial intelligence (AI) systems, at least the ones the public hear about, are famous for beating human players at chess or poker. Other algorithms are known for their ability to learn how to recognize cats or their inability to recognize people with darker skin. But are current AI systems anything more than toys? Sure, their ability to play games or identify animals is impressive, but does this help toward creating useful AI systems? To answer this, we need to take a step back and question what the goals of AI are.


Transformer-based Entity Typing in Knowledge Graphs

arXiv.org Artificial Intelligence

We investigate the knowledge graph entity typing task which aims at inferring plausible entity types. In this paper, we propose a novel Transformer-based Entity Typing (TET) approach, effectively encoding the content of neighbors of an entity. More precisely, TET is composed of three different mechanisms: a local transformer allowing to infer missing types of an entity by independently encoding the information provided by each of its neighbors; a global transformer aggregating the information of all neighbors of an entity into a single long sequence to reason about more complex entity types; and a context transformer integrating neighbors content based on their contribution to the type inference through information exchange between neighbor pairs. Furthermore, TET uses information about class membership of types to semantically strengthen the representation of an entity. Experiments on two real-world datasets demonstrate the superior performance of TET compared to the state-of-the-art.


Language Model Decomposition: Quantifying the Dependency and Correlation of Language Models

arXiv.org Artificial Intelligence

Pre-trained language models (LMs), such as BERT (Devlin et al., 2018) and its variants, have led to significant improvements on various NLP tasks in past years. However, a theoretical framework for studying their relationships is still missing. In this paper, we fill this gap by investigating the linear dependency between pre-trained LMs. The linear dependency of LMs is defined analogously to the linear dependency of vectors. We propose Language Model Decomposition (LMD) to represent a LM using a linear combination of other LMs as basis, and derive the closed-form solution. A goodness-of-fit metric for LMD similar to the coefficient of determination is defined and used to measure the linear dependency of a set of LMs. In experiments, we find that BERT and eleven (11) BERT-like LMs are 91% linearly dependent. This observation suggests that current state-of-the-art (SOTA) LMs are highly "correlated". To further advance SOTA we need more diverse and novel LMs that are less dependent on existing LMs.


ObSynth: An Interactive Synthesis System for Generating Object Models from Natural Language Specifications

arXiv.org Artificial Intelligence

We introduce ObSynth, an interactive system leveraging the domain knowledge embedded in large language models (LLMs) to help users design object models from high level natural language prompts. This is an example of specification reification, the process of taking a high-level, potentially vague specification and reifying it into a more concrete form. We evaluate ObSynth via a user study, leading to three key findings: first, object models designed using ObSynth are more detailed, showing that it often synthesizes fields users might have otherwise omitted. Second, a majority of objects, methods, and fields generated by ObSynth are kept by the user in the final object model, highlighting the quality of generated components. Third, ObSynth altered the workflow of participants: they focus on checking that synthesized components were correct rather than generating them from scratch, though ObSynth did not reduce the time participants took to generate object models.