On Model Stability as a Function of Random Seed
Madhyastha, Pranava, Jain, Rishabh
In this paper, we focus on quantifying model stability as a function of random seed by investigating the effects of the induced randomness on model performance and the robustness of the model in general. We specifically perform a controlled study on the effect of random seeds on the behaviour of attention, gradient-based and surrogate model based (LIME) interpretations. Our analysis suggests that random seeds can adversely affect the consistency of models resulting in counterfactual interpretations. We propose a technique called Aggressive Stochastic W eight Averaging (ASWA) and an extension called Norm-filtered Aggressive Stochastic W eight Averaging (NASWA) which improves the stability of models over random seeds. With our ASW A and NASW A based optimization, we are able to improve the robustness of the original model, on average reducing the standard deviation of the model's performance by 72% . 1 Introduction There has been a tremendous growth in deep neural network based models that achieve state-of- the-art performance. In fact, most recent end-to-end deep learning models have surpassed the performance of careful human feature-engineering based models in a variety of NLP tasks. However, deep neural network based models are often brittle to various sources of randomness in the training of the models. This could be attributed to several sources including, but not limited to, random parameter initialization, random sampling of examples during training and random dropping of neurons. It has been observed that these models have, more often, a set of random seeds that yield better results than others.
Sep-23-2019
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Health & Medicine > Therapeutic Area (0.32)
- Technology: