Active learning for misspecified generalized linear models
–Neural Information Processing Systems
Active learning refers to algorithmic frameworks aimed at selecting training data points in order to reduce the number of required training data points and/or improve thegeneralization performance of a learning method. In this paper, we present an asymptotic analysis of active learning for generalized linear models. Our analysis holds under the common practical situation of model misspecification, andis based on realistic assumptions regarding the nature of the sampling distributions, which are usually neither independent nor identical. We derive unbiased estimatorsof generalization performance, as well as estimators of expected reduction in generalization error after adding a new training data point, that allow us to optimize its sampling distribution through a convex optimization problem. Our analysis naturally leads to an algorithm for sequential active learning which is applicable for all tasks supported by generalized linear models (e.g., binary classification, multi-classclassification, regression) and can be applied in nonlinear settings through the use of Mercer kernels.
Neural Information Processing Systems
Dec-31-2007
- Technology: