Reasoning About Generalization via Conditional Mutual Information
Steinke, Thomas, Zakynthinou, Lydia
How can we ensure that a machine learning system produces an o utput that generalizes to the underlying distribution, rather than overfitting its train ing data? That is, how can we ensure that the hypotheses or models that are produced are reflective of t he underlying population the training data was drawn from, rather than patterns that occur only by c hance in the training data? This is perhaps the fundamental question for the science of statist ical machine learning. A vast array of methods have been proposed to answer this ques tion. Most notably, the theory of uniform convergence shows that, if the output is sufficiently "simple," then it cannot overfit too much. A more recent line of work has used distributional stability (in the form of differential privacy) to provide generalization guarantees that compose adaptivel y - that is, statistical validity is preserved even when a dataset is reused multiple times with each succes sive analysis being influenced by the outcomes of prior analyses. Other methods for proving gener alization include compression schemes and uniform stability. Unfortunately, these different methods for providing gener alization guarantees are largely disconnected from one another; it is, in general, not possible t o compare or combine techniques. In this paper, we provide a framework to reason about many of the se these differing approaches using the unifying language of information theory.
Jan-24-2020
- Country:
- North America > United States
- New York > New York County
- New York City (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- New York > New York County
- Europe > Spain
- Andalusia > Cádiz Province > Cadiz (0.04)
- North America > United States
- Genre:
- Research Report (0.50)
- Industry:
- Information Technology > Security & Privacy (0.67)
- Technology: