Goto

Collaborating Authors

 data diet


Deep Learning on a Data Diet: Finding Important Examples Early in Training

Neural Information Processing Systems

Recent success in deep learning has partially been driven by training increasingly overparametrized networks on ever larger datasets. It is therefore natural to ask: how much of the data is superfluous, which examples are important for generalization, and how do we find them? In this work, we make the striking observation that, in standard vision datasets, simple scores averaged over several weight initializations can be used to identify important examples very early in training. We propose two such scores--the Gradient Normed (GraNd) and the Error L2-Norm (EL2N) scores--and demonstrate their efficacy on a range of architectures and datasets by pruning significant fractions of training data without sacrificing test accuracy. In fact, using EL2N scores calculated a few epochs into training, we can prune half of the CIFAR10 training set while slightly improving test accuracy.


Lottery Tickets on a Data Diet: Finding Initializations with Sparse Trainable Networks

Neural Information Processing Systems

A striking observation about iterative magnitude pruning (IMP; Frankle et al. 2020) is that--after just a few hundred steps of dense training--the method can find a sparse sub-network that can be trained to the same accuracy as the dense network. However, the same does not hold at step 0, i.e. random initialization. In this work, we seek to understand how this early phase of pre-training leads to a good initialization for IMP both through the lens of the data distribution and the loss landscape geometry. Empirically we observe that, holding the number of pre-training iterations constant, training on a small fraction of (randomly chosen) data suffices to obtain an equally good initialization for IMP. We additionally observe that by pre-training only on "easy" training data, we can decrease the number of steps necessary to find a good initialization for IMP compared to training on the full dataset or a randomly chosen subset.


Self-Driving Cars Are Being Put on a Data Diet

WIRED

For self-driving-car developers, like many iPhone and Google Photos users, the growing cost of storing files on the cloud has become a nagging headache. Early on, robocar companies pursued a brute-force approach to maximize miles and data. "We could take all the data the cars have seen over time, the hundreds of thousands of pedestrians, cyclists, and vehicles, [and] take from that a model of how we expect them to move," said Chris Urmson, an early leader of Google's self-driving project, in a 2015 TED Talk. Urmson spoke at a time when autonomous vehicle prototypes were relatively few and the handful of companies testing them could afford to keep almost every data point they scooped up from the road. But nearly a decade later, Google's project and many others have fallen far behind their own predictions of the timeline for success.


Does "Deep Learning on a Data Diet" reproduce? Overall yes, but GraNd at Initialization does not

Kirsch, Andreas

arXiv.org Artificial Intelligence

The paper 'Deep Learning on a Data Diet' by Paul et al. (2021) introduces two innovative metrics for pruning datasets during the training of neural networks. While we are able to replicate the results for the EL2N score at epoch 20, the same cannot be said for the GraNd score at initialization. The GraNd scores later in training provide useful pruning signals, however. The GraNd score at initialization calculates the average gradient norm of an input sample across multiple randomly initialized models before any training has taken place. Our analysis reveals a strong correlation between the GraNd score at initialization and the input norm of a sample, suggesting that the latter could have been a cheap new baseline for data pruning. Unfortunately, neither the GraNd score at initialization nor the input norm surpasses random pruning in performance. This contradicts one of the findings in Paul et al. (2021). We were unable to reproduce their CIFAR-10 results using both an updated version of the original JAX repository and in a newly implemented PyTorch codebase. An investigation of the underlying JAX/FLAX code from 2021 surfaced a bug in the checkpoint restoring code that was fixed in April 2021 (https://github.com/google/flax/commit/28fbd95500f4bf2f9924d2560062fa50e919b1a5).


Lottery Tickets on a Data Diet: Finding Initializations with Sparse Trainable Networks

Paul, Mansheej, Larsen, Brett W., Ganguli, Surya, Frankle, Jonathan, Dziugaite, Gintare Karolina

arXiv.org Machine Learning

A striking observation about iterative magnitude pruning (IMP; Frankle et al. 2020) is that $\unicode{x2014}$ after just a few hundred steps of dense training $\unicode{x2014}$ the method can find a sparse sub-network that can be trained to the same accuracy as the dense network. However, the same does not hold at step 0, i.e. random initialization. In this work, we seek to understand how this early phase of pre-training leads to a good initialization for IMP both through the lens of the data distribution and the loss landscape geometry. Empirically we observe that, holding the number of pre-training iterations constant, training on a small fraction of (randomly chosen) data suffices to obtain an equally good initialization for IMP. We additionally observe that by pre-training only on "easy" training data, we can decrease the number of steps necessary to find a good initialization for IMP compared to training on the full dataset or a randomly chosen subset. Finally, we identify novel properties of the loss landscape of dense networks that are predictive of IMP performance, showing in particular that more examples being linearly mode connected in the dense network correlates well with good initializations for IMP. Combined, these results provide new insight into the role played by the early phase training in IMP.


An Intelligence In Our Image: The Risks Of Bias And Errors In Artificial Intelligence - Liwaiwai

#artificialintelligence

Right now, artificial intelligence (AI) and countless algorithms are integrated into our daily life. Because of the efficiency they bring into the table, the use of AI is only expected to widen. With humanity becoming more and more reliant on this technology, it is only natural to think about the implications. In contrast to the common impression that AI and algorithms are impartial and infallible, these technologies can fail miserably. William Welser IV and Osonde Osoba's An Intelligence in Our Image: The Risks of Bias and Errors in Artificial Intelligence evaluates algorithms and AI -- which they group together under the moniker, artificial agents -- its shortcomings and how they can be combated.