Nonvacuous Loss Bounds with Fast Rates for Neural Networks via Conditional Information Measures

Oct-22-2020–arXiv.org Machine Learning

We present a framework to derive bounds on the test loss of randomized learning algorithms for the case of bounded loss functions. This framework leads to bounds that depend on the conditional information density between the the output hypothesis and the choice of the training set, given a larger set of data samples from which the training set is formed. If the conditional information density is bounded uniformly in the sizenof the training set, our bounds decay as1/n, which is referred to as a fast rate. This is in contrast with the tail bounds involving conditional information measures available in the literature, which have a less benign 1/ n dependence. We demonstrate the usefulness of our tail bounds by showing that they lead to estimates of the test loss achievable with several neural network architectures trained on MNIST and Fashion-MNIST that match the state-of-the-art bounds available in the literature. In recent years, there has been a surge of interest in the use of information-theoretic techniques for bounding the loss of learning algorithms. While the first results of this flavor can be traced to the probably approximately correct (PAC)-Bayesian approach (McAllester, 1998; Catoni, 2007) (see also (Guedj, 2019) for a recent review), the connection between loss bounds and classical information-theoretic measures was made explicit in the works of Russo & Zou (2016) and Xu & Raginsky (2017), where bounds on the average population loss were derived in terms of the mutual information between the training data and the output hypothesis. Since then, these average loss bounds have been tightened (Bu et al., 2019; Asadi et al., 2018; Negrea et al., 2019). Furthermore, the information-theoretic framework has also been successfully applied to derive tail probability bounds on the population loss (Bassily et al., 2018; Esposito et al., 2019; Hellström & Durisi, 2020a). Of particular relevance to the present paper is the random-subset setting, introduced by Steinke & Zakynthinou (2020) and further studied in (Hellström & Durisi, 2020b; Haghifam et al., 2020).

bayesian inference, neural network, population loss, (17 more...)

arXiv.org Machine Learning

Oct-22-2020

arXiv.org PDF

Add feedback

Country:
- Europe (0.93)
- North America
  - Canada (0.46)
  - United States > California
    - Los Angeles County (0.14)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.34)
    - Neural Networks (1.00)
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found