Goto

Collaborating Authors

 train sample


Feature Likelihood Divergence: Evaluating the Generalization of Generative Models Using Samples

Neural Information Processing Systems

However, current methods for evaluating such models remain incomplete: standard likelihood-based metrics do not always apply and rarely correlate with perceptual fidelity, while sample-based metrics, such as FID, are insensitive to overfitting, i.e., inability to generalize beyond the training set.



DsDm: Model-Aware Dataset Selection with Datamodels

Engstrom, Logan, Feldmann, Axel, Madry, Aleksander

arXiv.org Artificial Intelligence

When selecting data for training large-scale models, standard practice is to filter for examples that match human notions of data quality. Such filtering yields qualitatively clean datapoints that intuitively should improve model behavior. However, in practice the opposite can often happen: we find that selecting according to similarity with "high quality" data sources may not increase (and can even hurt) performance compared to randomly selecting data. To develop better methods for selecting data, we start by framing dataset selection as an optimization problem that we can directly solve for: given target tasks, a learning algorithm, and candidate data, select the subset that maximizes model performance. This framework thus avoids handpicked notions of data quality, and instead models explicitly how the learning process uses train datapoints to predict on the target tasks. Our resulting method greatly improves language model (LM) performance on both pre-specified tasks and previously unseen tasks. Specifically, choosing target tasks representative of standard LM problems and evaluating on diverse held-out benchmarks, our selected datasets provide a 2x compute multiplier over baseline methods.


Feature Likelihood Score: Evaluating the Generalization of Generative Models Using Samples

Jiralerspong, Marco, Bose, Avishek Joey, Gemp, Ian, Qin, Chongli, Bachrach, Yoram, Gidel, Gauthier

arXiv.org Artificial Intelligence

The past few years have seen impressive progress in the development of deep generative models capable of producing high-dimensional, complex, and photo-realistic data. However, current methods for evaluating such models remain incomplete: standard likelihood-based metrics do not always apply and rarely correlate with perceptual fidelity, while sample-based metrics, such as FID, are insensitive to overfitting, i.e., inability to generalize beyond the training set. To address these limitations, we propose a new metric called the Feature Likelihood Score (FLS), a parametric sample-based score that uses density estimation to provide a comprehensive trichotomic evaluation accounting for novelty (i.e., different from the training samples), fidelity, and diversity of generated samples. We empirically demonstrate the ability of FLS to identify specific overfitting problem cases, where previously proposed metrics fail. We also extensively evaluate FLS on various image datasets and model classes, demonstrating its ability to match intuitions of previous metrics like FID while offering a more comprehensive evaluation of generative models. Code is available at https://github.com/marcojira/fls.


Vanishing Gradients in Reinforcement Finetuning of Language Models

Razin, Noam, Zhou, Hattie, Saremi, Omid, Thilak, Vimal, Bradley, Arwen, Nakkiran, Preetum, Susskind, Joshua, Littwin, Etai

arXiv.org Machine Learning

Pretrained language models are commonly aligned with human preferences and downstream tasks via reinforcement finetuning (RFT), which entails maximizing a (possibly learned) reward function using policy gradient algorithms. This work highlights a fundamental optimization obstacle in RFT: we prove that the expected gradient for an input vanishes when its reward standard deviation under the model is small, even if the expected reward is far from optimal. Through experiments on an RFT benchmark and controlled environments, as well as a theoretical analysis, we then demonstrate that vanishing gradients due to small reward standard deviation are prevalent and detrimental, leading to extremely slow reward maximization. Lastly, we explore ways to overcome vanishing gradients in RFT. We find the common practice of an initial supervised finetuning (SFT) phase to be the most promising candidate, which sheds light on its importance in an RFT pipeline. Moreover, we show that a relatively small number of SFT optimization steps on as few as 1% of the input samples can suffice, indicating that the initial SFT phase need not be expensive in terms of compute and data labeling efforts. Overall, our results emphasize that being mindful for inputs whose expected gradient vanishes, as measured by the reward standard deviation, is crucial for successful execution of RFT.


Transfer Learning with TensorFlowJS

#artificialintelligence

In practice, I believe that in most cases rather than creating models from scratch you will create models which already trained and solve a problem that is close to yours. This technique is called Transfer Learning. As you may already know, one big issue of training models from scratch is that we need to collect and label a huge amount of data and it's pretty time-consuming work that may be not affordable for your project. Also, it is computationally very expensive to train a neural network on millions of images and it may require weeks of training on multiple GPUs. The mental workflow of Transfer Learning is depicted in figure 1.