Goto

Collaborating Authors

 zerogen


GOLD: Generalized Knowledge Distillation via Out-of-Distribution-Guided Language Data Generation

arXiv.org Artificial Intelligence

Knowledge distillation from LLMs is essential for the efficient deployment of language models. Prior works have proposed data generation using LLMs for preparing distilled models. We argue that generating data with LLMs is prone to sampling mainly from the center of original content distribution. This limitation hinders the distilled model from learning the true underlying data distribution and to forget the tails of the distributions (samples with lower probability). To this end, we propose GOLD, a task-agnostic data generation and knowledge distillation framework, which employs an iterative out-of-distribution-guided feedback mechanism for the LLM. As a result, the generated data improves the generalizability of distilled models. An energy-based OOD evaluation approach is also introduced to deal with noisy generated data. Our extensive experiments on 10 different classification and sequence-to-sequence tasks in NLP show that GOLD respectively outperforms prior arts and the LLM with an average improvement of 5% and 14%. We will also show that the proposed method is applicable to less explored and novel tasks. The code is available.


Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models

arXiv.org Artificial Intelligence

Data Synthesis is a promising way to train a small model with very little labeled data. One approach for data synthesis is to leverage the rich knowledge from large language models to synthesize pseudo training examples for small models, making it possible to achieve both data and compute efficiency at the same time. However, a key challenge in data synthesis is that the synthesized dataset often suffers from a large distributional discrepancy from the real task data distribution. Thus, in this paper, we propose Synthesis Step by Step (S3), a data synthesis framework that shrinks this distribution gap by iteratively extrapolating the errors made Figure 1: Training and testing accuracy of DistilBert by a small model trained on the synthesized with ZeroGen (Ye et al., 2022b) on the IMDb dataset dataset on a small real-world validation dataset with 200k training datapoints. Also shown are the training using a large language model. Extensive experiments and testing accuracy of the model trained on Gold-on multiple NLP tasks show that our Data. We can see here that ZeroGen's training accuracy approach improves the performance of a small quickly reaches nearly 100%, but testing accuracy remains model by reducing the gap between the synthetic low.


Overview -- ZeroGen, Efficient Zero-shot Learning via Dataset Generation

#artificialintelligence

An interesting take on zero-shot learning was introduced in a paper that was dated Feb 16. More efficient and flexible ways to conduct zero-shot learning with PLMs were explored by the authors. They take the dataset generation method to the extreme and study ZeroGEN, a flexible and efficient zero-shot learning framework via dataset generation. With the pseudo-dataset, a tiny task model TAM is trained to conduct the given task. This procedure is highly flexible, meaning that any model architecture, loss function, and training strategy can be used.


Meet ZEROGEN: An Extreme Method for Dataset Generation via PLMs for Zero-Shot Learning

#artificialintelligence

The impressive generative capacity of large-scale pretrained language models (PLMs) has inspired machine learning researchers to explore methods for generating model training examples via PLMs and data augmentation procedures, i.e. dataset generation. A novel contribution in this research direction is proposed in the new paper ZeroGen: Efficient Zero-shot Learning via Dataset Generation, from researchers at the University of Hong Kong, Shanghai AI Lab, Huawei Noah's Ark Lab and the University of Washington. The team describes their proposed ZEROGEN as an "extreme instance" of dataset generation via PLMs for zero-shot learning. ZEROGEN is a framework for prompt-based zero-shot learning (PROMPTING). Unlike existing approaches that rely on gigantic PLMs during inference, ZEROGEM introduces a more flexible and efficient approach for conducting zero-shot learning with PLMs.