conditional capacity
Universal Batch Learning Under The Misspecification Setting
In this paper we consider the problem of universal {\em batch} learning in a misspecification setting with log-loss. In this setting the hypothesis class is a set of models $\Theta$. However, the data is generated by an unknown distribution that may not belong to this set but comes from a larger set of models $\Phi \supset \Theta$. Given a training sample, a universal learner is requested to predict a probability distribution for the next outcome and a log-loss is incurred. The universal learner performance is measured by the regret relative to the best hypothesis matching the data, chosen from $\Theta$. Utilizing the minimax theorem and information theoretical tools, we derive the optimal universal learner, a mixture over the set of the data generating distributions, and get a closed form expression for the min-max regret. We show that this regret can be considered as a constrained version of the conditional capacity between the data and its generating distributions set. We present tight bounds for this min-max regret, implying that the complexity of the problem is dominated by the richness of the hypotheses models $\Theta$ and not by the data generating distributions set $\Phi$. We develop an extension to the Arimoto-Blahut algorithm for numerical evaluation of the regret and its capacity achieving prior distribution. We demonstrate our results for the case where the observations come from a $K$-parameters multinomial distributions while the hypothesis class $\Theta$ is only a subset of this family of distributions.
Capacity allocation analysis of neural networks: A tool for principled architecture design
Since the popularization of deep neural networks in the early 2010s, tailoring neural network architectures to specific tasks has been one of the main sources of activity for both academics and practitioners. Accordingly, a palette of empirical methods has been developed for automating the choice of neural networks hyperparameters (a process sometimes called Neural Architecture Search), including - but not limited to - random search [2, 1], genetic algorithms [16, 13], bayesian methods [24, 12] or reinforcement learning [29]. However, when the computational requirements for training a single model are high, such approaches might be too expensive or result in iteration cycles that are too long to be practically useful - though some work in that direction has been carried out recently [5, 14]. In other cases, when the loss function is only used as a proxy for the task at hand [25, 26, 10] or is not interpretable [8], a further perceptual evaluation is typically necessary to evaluate the quality of a model's outputs and such systematic approaches at least partially break down. In both cases, an efficient and quantitative method to analyze and compare neural network architectures would be highly desirable - be it only to come up with a limited set of plausible candidates to pass on to the more expensive (or manual) methods. In this paper, we introduce the notion of capacity allocation analysis, which is a systematic, quantitative andcomputationally efficient way to analyze neural network architectures by quantifying which dependencies between inputs and outputs a parameter of a set of parameters actually model. We develop a quantitative framework for assessing and comparing different architectures for a given task, providing insights that are complementary to the value of the loss function itself.