multilabel classification
Bandit and Delayed Feedback in Online Structured Prediction
Online structured prediction is a task of sequentially predicting outputs with complex structures based on inputs and past observations, encompassing online classification. Recent studies showed that in the full-information setting, we can achieve finite bounds on the surrogate regret, i.e., the extra target loss relative to the best possible surrogate loss. In practice, however, full-information feedback is often unrealistic as it requires immediate access to the whole structure of complex outputs. Motivated by this, we propose algorithms that work with less demanding feedback, bandit and delayed feedback. For bandit feedback, by using a standard inverseweighted gradient estimator, we achieve a surrogate regret bound of O( KT) for the time horizon T and the size of the output set K. However, K can be extremely large when outputs are highly complex, resulting in an undesirable bound. To address this issue, we propose another algorithm that achieves a surrogate regret bound of O(T2/3), which is independent of K. This is achieved with a carefully designed pseudo-inverse matrix estimator. Furthermore, we numerically compare the performance of these algorithms, as well as existing ones. Regarding delayed feedback, we provide algorithms and regret analyses that cover various scenarios, including full-information and bandit feedback, as well as fixed and variable delays.
Supplementary material for " Regret Bounds for Multilabel Classification in Sparse Label Regimes "
This appendix contains all proofs of the results mentioned in the main body of the paper, plus further results which have been omitted there due to space limits. We recall the following lemma which upper bounds the probability measure of the ball around a point x X that contains its kth nearest neighbors. The proof immediately follows from the multiplicative Chernoff bound (see, e.g., Lemma 3.2 in [28]). When combined with Assumption 5.1 we obtain the following corollary. Corollary A.2. Suppose that the measure-smoothness assumption (Assumption 5.1) holds with parameters ฮป, Cฮป, k k.
Multilabel Classification by Hierarchical Partitioning and Data-dependent Grouping
In modern multilabel classification problems, each data instance belongs to a small number of classes among a large set of classes. In other words, these problems involve learning very sparse binary label vectors. Moreover, in the large-scale problems, the labels typically have certain (unknown) hierarchy. In this paper we exploit the sparsity of label vectors and the hierarchical structure to embed them in low-dimensional space using label groupings. Consequently, we solve the classification problem in a much lower dimensional space and then obtain labels in the original space using an appropriately defined lifting.
Regret Bounds for Multilabel Classification in Sparse Label Regimes
Multi-label classification (MLC) has wide practical importance, but the theoretical understanding of its statistical properties is still limited. As an attempt to fill this gap, we thoroughly study upper and lower regret bounds for two canonical MLC performance measures, Hamming loss and Precision@$\kappa$. We consider two different statistical and algorithmic settings, a non-parametric setting tackled by plug-in classifiers \`a la $k$-nearest neighbors, and a parametric one tackled by empirical risk minimization operating on surrogate loss functions. For both, we analyze the interplay between a natural MLC variant of the low noise assumption, widely studied in binary classification, and the label sparsity, the latter being a natural property of large-scale MLC problems. We show that those conditions are crucial in improving the bounds, but the way they are tangled is not obvious, and also different across the two settings.