Bibas, Koby, Fogel, Yaniv, Feder, Meir

Linear regression is a classical paradigm in statistics. A new look at it is provided via the lens of universal learning. In applying universal learning to linear regression the hypotheses class represents the label $y\in {\cal R}$ as a linear combination of the feature vector $x^T\theta$ where $x\in {\cal R}^M$, within a Gaussian error. The Predictive Normalized Maximum Likelihood (pNML) solution for universal learning of individual data can be expressed analytically in this case, as well as its associated learnability measure. Interestingly, the situation where the number of parameters $M$ may even be larger than the number of training samples $N$ can be examined. As expected, in this case learnability cannot be attained in every situation; nevertheless, if the test vector resides mostly in a subspace spanned by the eigenvectors associated with the large eigenvalues of the empirical correlation matrix of the training data, linear regression can generalize despite the fact that it uses an ``over-parametrized'' model. We demonstrate the results with a simulation of fitting a polynomial to data with a possibly large polynomial degree.

Bibas, Koby, Fogel, Yaniv, Feder, Meir

The Predictive Normalized Maximum Likelihood (pNML) scheme has been recently suggested for universal learning in the individual setting, where both the training and test samples are individual data. The goal of universal learning is to compete with a ``genie'' or reference learner that knows the data values, but is restricted to use a learner from a given model class. The pNML minimizes the associated regret for any possible value of the unknown label. Furthermore, its min-max regret can serve as a pointwise measure of learnability for the specific training and data sample. In this work we examine the pNML and its associated learnability measure for the Deep Neural Network (DNN) model class. As shown, the pNML outperforms the commonly used Empirical Risk Minimization (ERM) approach and provides robustness against adversarial attacks. Together with its learnability measure it can detect out of distribution test examples, be tolerant to noisy labels and serve as a confidence measure for the ERM. Finally, we extend the pNML to a ``twice universal'' solution, that provides universality for model class selection and generates a learner competing with the best one from all model classes.

Fogel, Yaniv, Feder, Meir

Universal supervised learning is considered from an information theoretic point of view following the universal prediction approach, see Merhav and Feder (1998). We consider the standard supervised "batch" learning where prediction is done on a test sample once the entire training data is observed, and the individual setting where the features and labels, both in the training and test, are specific individual quantities. The information theoretic approach naturally uses the self-information loss or log-loss. Our results provide universal learning schemes that compete with a "genie" (or reference) that knows the true test label. In particular, it is demonstrated that the main proposed scheme, termed Predictive Normalized Maximum Likelihood (pNML), is a robust learning solution that outperforms the current leading approach based on Empirical Risk Minimization (ERM). Furthermore, the pNML construction provides a pointwise indication for the learnability of the specific test challenge with the given training examples