Estimating the Fundamental Limits is Easier than Achieving the Fundamental Limits
Jiao, Jiantao, Han, Yanjun, Fischer-Hwang, Irena, Weissman, Tsachy
Suppose there exist three machine learning experts that would like to understand the fundamental limits of classification (Bayes error) [1] for a specific dataset. Since the true distribution that generates the data is unknown, they take three different approaches: 1) Expert A: given empirical training samples, produce an estimate of the Bayes error that is (near) optimal statistically; 2) Expert B: construct a (near) optimal classifier based on the training sample, and then use its performance on the test set (may have infinite size) to estimate the Bayes error; 3) Expert C: use the training error of a (near) optimal classification algorithm to estimate the Bayes error. We ask the question: are there any fundamental differences between experts A, B, and C? Evidently, expert A is not constrained by any specific approaches as experts B and C are, but if B and C are using (near) optimal classification algorithms, would B or C achieve the same performance of A if A chooses to act optimally? Similar situations arise in the understanding of fundamental limits of data compression and sequential prediction under logarithmic loss, which is given by the Shannon entropy rate [2]. In this situation, there could exist four different experts: 1) A: would like to estimate the limits of compression (near) optimally; 2) B: would like to construct a predictor based on training samples and use its prediction accuracy under logarithmic loss on the test set (may have infinite size) to estimate the limits; 3) C: would like to use the training error of a (near) optimal sequential predictor to estimate the limits; 4) D: would like to construct a (near) optimal data compressor and use its normalized code length to estimate the limits. In this situation, are there any fundamental differences between the tasks of these four experts?
Oct-1-2017
- Country:
- North America > United States
- New York (0.04)
- California > Santa Clara County
- Palo Alto (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- Genre:
- Research Report (0.40)