low-complexity nonparametric bayesian online prediction
Low-Complexity Nonparametric Bayesian Online Prediction with Universal Guarantees
We propose a novel nonparametric online predictor for discrete labels conditioned on multivariate continuous features. The predictor is based on a feature space discretization induced by a full-fledged k-d tree with randomly picked directions and a recursive Bayesian distribution, which allows to automatically learn the most relevant feature scales characterizing the conditional distribution. We prove its pointwise universality, i.e., it achieves a normalized log loss performance asymptotically as good as the true conditional entropy of the labels given the features. The time complexity to process the n-th sample point is O(log n) in probability with respect to the distribution generating the data points, whereas other exact nonparametric methods require to process all past observations. Experiments on challenging datasets show the computational and statistical efficiency of our algorithm in comparison to standard and state-of-the-art methods.
Reviews: Low-Complexity Nonparametric Bayesian Online Prediction with Universal Guarantees
A reader unfamiliar with the Context Tree Weighting technique might have a difficult first read, but as it is a well-known technique in information theory and its applications, I don't think this should be held against it. Context Tree Weighting based variants have been used successfully in many different problems (data compression, bioinformatics), but typically deal with relatively low-dimensional binary side information, so this paper provides a method that fills this gap, and in my mind could be built on further by the machine learning community in the future. Some suggestions for future work: - There is a body of literature which improves on the KT estimator on sequences where the entire alphabet isn't observed, which is a common case when the data is recursively partitioned. I am quite certain the method advocated in this approach could benefit from applying the recently introduced SAD estimator (https://arxiv.org/abs/1305.3671) in place of the multi-KT, and the theoretical results extended to this case.
Low-Complexity Nonparametric Bayesian Online Prediction with Universal Guarantees
We propose a novel nonparametric online predictor for discrete labels conditioned on multivariate continuous features. The predictor is based on a feature space discretization induced by a full-fledged k-d tree with randomly picked directions and a recursive Bayesian distribution, which allows to automatically learn the most relevant feature scales characterizing the conditional distribution. We prove its pointwise universality, i.e., it achieves a normalized log loss performance asymptotically as good as the true conditional entropy of the labels given the features. The time complexity to process the n-th sample point is O(log n) in probability with respect to the distribution generating the data points, whereas other exact nonparametric methods require to process all past observations. Experiments on challenging datasets show the computational and statistical efficiency of our algorithm in comparison to standard and state-of-the-art methods.
Low-Complexity Nonparametric Bayesian Online Prediction with Universal Guarantees
LHERITIER, Alix, Cazals, Frederic
We propose a novel nonparametric online predictor for discrete labels conditioned on multivariate continuous features. The predictor is based on a feature space discretization induced by a full-fledged k-d tree with randomly picked directions and a recursive Bayesian distribution, which allows to automatically learn the most relevant feature scales characterizing the conditional distribution. We prove its pointwise universality, i.e., it achieves a normalized log loss performance asymptotically as good as the true conditional entropy of the labels given the features. The time complexity to process the n-th sample point is O(log n) in probability with respect to the distribution generating the data points, whereas other exact nonparametric methods require to process all past observations. Experiments on challenging datasets show the computational and statistical efficiency of our algorithm in comparison to standard and state-of-the-art methods.