A Direct Sum Result for the Information Complexity of Learning

Nachum, Ido, Shafer, Jonathan, Yehudayoff, Amir

Apr-15-2018–arXiv.org Machine Learning

How many bits of information are required to PAC learn a class of hypotheses of VC dimension $d$? The mathematical setting we follow is that of Bassily et al. (2018), where the value of interest is the mutual information $\mathrm{I}(S;A(S))$ between the input sample $S$ and the hypothesis outputted by the learning algorithm $A$. We introduce a class of functions of VC dimension $d$ over the domain $\mathcal{X}$ with information complexity at least $\Omega\left(d\log \log \frac{|\mathcal{X}|}{d}\right)$ bits for any consistent and proper algorithm (deterministic or random). Bassily et al. proved a similar (but quantitatively weaker) result for the case $d=1$. The above result is in fact a special case of a more general phenomenon we explore. We define the notion of information complexity of a given class of functions $\mathcal{H}$. Intuitively, it is the minimum amount of information that an algorithm for $\mathcal{H}$ must retain about its input to ensure consistency and properness. We prove a direct sum result for information complexity in this context; roughly speaking, the information complexity sums when combining several classes.

algorithm, artificial intelligence, machine learning, (12 more...)

arXiv.org Machine Learning

Apr-15-2018

arXiv.org PDF

Add feedback

Country:
- North America > United States > California (0.46)

Genre:
- Research Report (0.64)

Industry:
- Information Technology (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found