Capacity allocation analysis of neural networks: A tool for principled architecture design
Since the popularization of deep neural networks in the early 2010s, tailoring neural network architectures to specific tasks has been one of the main sources of activity for both academics and practitioners. Accordingly, a palette of empirical methods has been developed for automating the choice of neural networks hyperparameters (a process sometimes called Neural Architecture Search), including - but not limited to - random search [2, 1], genetic algorithms [16, 13], bayesian methods [24, 12] or reinforcement learning [29]. However, when the computational requirements for training a single model are high, such approaches might be too expensive or result in iteration cycles that are too long to be practically useful - though some work in that direction has been carried out recently [5, 14]. In other cases, when the loss function is only used as a proxy for the task at hand [25, 26, 10] or is not interpretable [8], a further perceptual evaluation is typically necessary to evaluate the quality of a model's outputs and such systematic approaches at least partially break down. In both cases, an efficient and quantitative method to analyze and compare neural network architectures would be highly desirable - be it only to come up with a limited set of plausible candidates to pass on to the more expensive (or manual) methods. In this paper, we introduce the notion of capacity allocation analysis, which is a systematic, quantitative andcomputationally efficient way to analyze neural network architectures by quantifying which dependencies between inputs and outputs a parameter of a set of parameters actually model. We develop a quantitative framework for assessing and comparing different architectures for a given task, providing insights that are complementary to the value of the loss function itself.
Feb-12-2019