Multicalibration as Boosting for Regression
Globus-Harris, Ira, Harrison, Declan, Kearns, Michael, Roth, Aaron, Sorrell, Jessica
–arXiv.org Artificial Intelligence
We revisit the problem of boosting for regression, and develop a new agnostic regression boosting algorithm via a connection to multicalibration. In doing so, we shed additional light on multicalibration, a recent learning objective that has emerged from the algorithmic fairness literature [Hébert-Johnson et al., 2018]. In particular, we characterize multicalibration in terms of a "swap-regret" like condition, and use it to answer the question "what property must a collection of functions H have so that multicalibration with respect to H implies Bayes optimality?", giving a complete answer to problem asked by Burhanpurkar et al. [2021]. Using our swap-regret characterization, we derive an especially simple algorithm for learning a multicalibrated predictor for a class of functions H by reduction to a standard squared-error regression algorithm for H. The same algorithm can also be analyzed as a boosting algorithm for squared error regression that makes calls to a weak learner for squared error regression on subsets of the original data distribution without the need to relabel examples (in contrast to Gradient Boosting as well as existing multicalibration algorithms). This lets us specify a weak learning condition that is sufficient for convergence to the Bayes optimal predictor (even if the Bayes optimal predictor does not have zero error), avoiding the kinds of realizability assumptions that are implicit in analyses of boosting algorithms that converge to zero error. We conclude that ensuring multicalibration with respect to H corresponds to boosting for squared error regression in which H forms the set of weak learners. Finally we define a weak learning condition for H relative to a constrained class of functions C (rather than with respect to the Bayes optimal predictor). We show that multicalibration with respect to H implies multicalibration with respect to C if H satisfies the weak learning condition with respect to C, which in turn implies accuracy at least that of the best function in C. Multicalibration Consider a distribution D P Z defined over a domain Z " X ˆ R of feature vectors x P X paired with real valued labels y.
arXiv.org Artificial Intelligence
Jan-31-2023
- Country:
- Europe
- France (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Europe
- Genre:
- Research Report (0.50)
- Technology: