Nguyen, Vu-Linh
Probabilistic Multi-Dimensional Classification
Nguyen, Vu-Linh, Yang, Yang, de Campos, Cassio
Multi-dimensional classification (MDC) can be employed in a range of applications where one needs to predict multiple class variables for each given instance. Many existing MDC methods suffer from at least one of inaccuracy, scalability, limited use to certain types of data, hardness of interpretation or lack of probabilistic (uncertainty) estimations. This paper is an attempt to address all these disadvantages simultaneously. We propose a formal framework for probabilistic MDC in which learning an optimal multi-dimensional classifier can be decomposed, without loss of generality, into learning a set of (smaller) single-variable multi-class probabilistic classifiers and a directed acyclic graph. Current and future developments of both probabilistic classification and graphical model learning can directly enhance our framework, which is flexible and provably optimal. A collection of experiments is conducted to highlight the usefulness of this MDC framework.
Skeptical inferences in multi-label ranking with sets of probabilities
Alarcón, Yonatan Carlos Carranza, Nguyen, Vu-Linh
Such MLC problems arise in a number of problems including text categorization [1, 2], music categorization [3], semantic scene classification [4], or protein function classification [5]. We refer to [6] and [7] for comprehensive survey articles on this topic. It is quite common in applications for the multi-label learner to output a ranking on each query instance, that is, a ranking of labels from most likely relevant to most likely irrelevant. A prediction of that kind is commonly evaluated in terms of the rank loss which is the fraction of incorrectly ordered label pairs, where a relevant and a irrelevant label are incorrectly ordered if the former does not precede the latter [8, 9, 10]. The problem of making skeptical inferences for MLC under the presence of uncertainty has been studied in the literature [11, 12, 13].
Learning Gradient Boosted Multi-label Classification Rules
Rapp, Michael, Mencía, Eneldo Loza, Fürnkranz, Johannes, Nguyen, Vu-Linh, Hüllermeier, Eyke
In multi-label classification, where the evaluation of predictions is less straightforward than in single-label classification, various meaningful, though different, loss functions have been proposed. Ideally, the learning algorithm should be customizable towards a specific choice of the performance measure. Modern implementations of boosting, most prominently gradient boosted decision trees, appear to be appealing from this point of view. However, they are mostly limited to single-label classification, and hence not amenable to multi-label losses unless these are label-wise decomposable. In this work, we develop a generalization of the gradient boosting framework to multi-output problems and propose an algorithm for learning multi-label classification rules that is able to minimize decomposable as well as non-decomposable loss functions. Using the well-known Hamming loss and subset 0/1 loss as representatives, we analyze the abilities and limitations of our approach on synthetic data and evaluate its predictive performance on multi-label benchmarks.
On Aggregation in Ensembles of Multilabel Classifiers
Nguyen, Vu-Linh, Hüllermeier, Eyke, Rapp, Michael, Mencía, Eneldo Loza, Fürnkranz, Johannes
While a variety of ensemble methods for multilabel classification have been proposed in the literature, the question of how to aggregate the predictions of the individual members of the ensemble has received little attention so far. In this paper, we introduce a formal framework of ensemble multilabel classification, in which we distinguish two principal approaches: "predict then combine" (PTC), where the ensemble members first make loss minimizing predictions which are subsequently combined, and "combine then predict" (CTP), which first aggregates information such as marginal label probabilities from the individual ensemble members, and then derives a prediction from this aggregation. While both approaches generalize voting techniques commonly used for multilabel ensembles, they allow to explicitly take the target performance measure into account. Therefore, concrete instantiations of CTP and PTC can be tailored to concrete loss functions. Experimentally, we show that standard voting techniques are indeed outperformed by suitable instantiations of CTP and PTC, and provide some evidence that CTP performs well for decomposable loss functions, whereas PTC is the better choice for non-decomposable losses.
Epistemic Uncertainty Sampling
Nguyen, Vu-Linh, Destercke, Sébastien, Hüllermeier, Eyke
Various strategies for active learning have been proposed in the machine learning literature. In uncertainty sampling, which is among the most popular approaches, the active learner sequentially queries the label of those instances for which its current prediction is maximally uncertain. The predictions as well as the measures used to quantify the degree of uncertainty, such as entropy, are almost exclusively of a probabilistic nature. In this paper, we advocate a distinction between two different types of uncertainty, referred to as epistemic and aleatoric, in the context of active learning. Roughly speaking, these notions capture the reducible and the irreducible part of the total uncertainty in a prediction, respectively. We conjecture that, in uncertainty sampling, the usefulness of an instance is better reflected by its epistemic than by its aleatoric uncertainty. This leads us to suggest the principle of "epistemic uncertainty sampling", which we instantiate by means of a concrete approach for measuring epistemic and aleatoric uncertainty. In experimental studies, epistemic uncertainty sampling does indeed show promising performance.
Reliable Multi-label Classification: Prediction with Partial Abstention
Nguyen, Vu-Linh, Hüllermeier, Eyke
In statistics and machine learning, classification with abstention, also known as classification with a reject option, is an extension of the standard setting of classification, in which the learner is allowed to refuse a prediction for a given query instance; research on this setting dates back to early work by Chow (1970) and Hellman (1970) and remains to be an important topic till today (Cortes et al., 2016). For the learner, the main reason to abstain is a lack of certainty about the corresponding outcome--refusing or at least deferring a decision might then be better than taking a high risk of a wrong decision. Nowadays, there are many machine learning problems in which complex, structured predictions are sought (instead of scalar values, like in classification and regression). For such problems, the idea of abstaining from a prediction can be generalized toward partial abstention: Instead of predicting the entire structure, the learner predicts only parts of it, namely those for which it is certain enough. This idea has already been realized, for example, for the problem of label ranking, where predictions are rankings (Cheng et al., 2012).
Querying Partially Labelled Data to Improve a K-nn Classifier
Nguyen, Vu-Linh (University of Technology of Compiegne) | Destercke, Sébastien (University of Technology of Compiegne) | Masson, Marie-Helene (University of Technology of Compiegne and Universite de Picardie Jules Verne)
When learning from instances whose output labels may be partial, the problem of knowing which of these output labels should be made precise to improve the accuracy of predictions arises. This problem can be seen as the intersection of two tasks: the one of learning from partial labels and the one of active learning, where the goal is to provide the labels of additional instances to improve the model accuracy. In this paper, we propose querying strategies of partial labels for the well-known K-nn classifier. We propose different criteria of increasing complexity, using among other things the amount of ambiguity that partial labels introduce in the K-nn decision rule. We then show that our strategies usually outperform simple baseline schemes, and that more complex strategies provide a faster improvement of the model accuracies.