Random Search as a Baseline for Sparse Neural Network Architecture Search
–arXiv.org Artificial Intelligence
Overparameterized neural networks are loosely characterized as networks that have a very high fitting (or memorization) capacity with respect to their training data. Although capable of memorization of their training data, these networks intriguingly achieve very low test error close to their training error rates [1, 2]. Meanwhile, sparse neural networks have shown similar or better generalization performance than their dense counterparts while having higher parameter efficiency [3]. With increasing availability of hardware and software that support sparse computational operations [4, 5], there has been a growing interest in finding sparse sub-networks within large overparameterized models to either improve generalization performance or to gain computational efficiency at the same performance level [6, 7, 8, 3]. Earlier works on creating efficient sparse sub-networks include the now popular pruning technique [9]. These were motivated by the desire to achieve compute efficiency in resource constraint applications by finding smaller networks within a larger network space without losing task performance quality [10]. The original pruning technique involves fully training a larger network on some task, discarding the task-irrelevant connections, and then fine-tuning the remaining sparse sub-network on the task to achieve the a level of performance near that of the larger network. Connections were originally pruned based on loss Hessians [9, 11]. Later on, other techniques were proposed such as the removal of weak connections [12] based on weight value thresholds.
arXiv.org Artificial Intelligence
Mar-14-2024
- Country:
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Genre:
- Research Report (0.82)
- Technology: