Rethinking Fano's Inequality in Ensemble Learning

Morishita, Terufumi, Morio, Gaku, Horiguchi, Shota, Ozaki, Hiroaki, Nukaga, Nobuo

arXiv.org Machine Learning 

The central question of ensemble learning has been: what factors make an ensemble system good or bad? It has We propose a fundamental theory on ensemble been widely believed that accurate and diverse models lead learning that answers the central question: what to better performance for ensemble systems. Guided by factors make an ensemble system good or bad? this intuition, many heuristical metrics have been proposed Previous studies used a variant of Fano's inequality to measure accuracy and diversity (Kohavi et al., 1996; of information theory and derived a lower Skalak et al., 1996; Cunningham & Carney, 2000; Shipp bound of the classification error rate on the basis & Kuncheva, 2002). However, these metrics lack theoretical of the accuracy and diversity of models. We grounding, and indeed, Kuncheva & Whitaker (2003) revisit the original Fano's inequality and argue empirically showed that there are no connections between that the studies did not take into account the information the metrics and system performance through a broad range lost when multiple model predictions of experiments. Turning to theoretical viewpoints, Geman are combined into a final prediction. To address et al. (1992) decomposed the squared error loss used in regression this issue, we generalize the previous theory to tasks into the bias and covariance of models. Bias incorporate the information loss, which we name here corresponds to accuracy and covariance diversity.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found