Random Forest for Label Ranking

arXiv.org Machine Learning

Label ranking aims to learn a mapping from instances to rankings over a finite number of predefined labels. Random forest is a powerful and one of the most successfully general-purpose machine learning algorithms of modern times. In the literature, there seems no research has yet been done in applying random forest to label ranking. In this paper, We present a powerful random forest label ranking method which uses random decision trees to retrieve nearest neighbors that are not only similar in the feature space but also in the ranking space. We have developed a novel two-step rank aggregation strategy to effectively aggregate neighboring rankings discovered by the random forest into a final predicted ranking. Compared with existing methods, the new random forest method has many advantages including its intrinsically scalable tree data structure, highly parallel-able computational architecture and much superior performances. We present extensive experimental results to demonstrate that our new method achieves the best predictive accuracy performances compared with state-of-the-art methods for datasets with complete ranking and datasets with only partial ranking information.


Empirical Evaluation of Ranking Trees on Some Metalearning Problems

AAAI Conferences

The problem of learning rankings is receiving increased attention from several research communities. In this paper we empirically evaluate an adaptation of the algorithm of learning decision trees for rankings. Our experiments are carried out on some metalearning problems, which consist of relating characteristics of learning problems to the relative performance of learning algorithms. We obtain positive results which, somewhat surprisingly, indicate that the method predicts more accurately the top ranks.


Multi-Prototype Label Ranking with Novel Pairwise-to-Total-Rank Aggregation

AAAI Conferences

We propose a multi-prototype-based algorithm for online learning of soft pairwise-preferences over labels. The algorithm learns soft label preferences via minimization of the proposed soft rank-loss measure, and can learn from total orders as well as from various types of partial orders. The soft pairwise preference algorithm outputs are further aggregated to produce a total label ranking prediction using a novel aggregation algorithm that outperforms existing aggregation solutions. Experiments on synthetic and real-world data demonstrate state-of-the-art performance of the proposed model.


ROAR: Robust Label Ranking for Social Emotion Mining

AAAI Conferences

Understanding and predicting latent emotions of users toward online contents, known as social emotion mining, has become increasingly important to both social platforms and businesses alike. Despite recent developments, however, very little attention has been made to the issues of nuance, subjectivity, and bias of social emotions. In this paper, we fill this gap by formulating social emotion mining as a robust label ranking problem, and propose: (1) a robust measure, named as G-mean-rank (GMR), which sets a formal criterion consistent with practical intuition; and (2) a simple yet effective label ranking model, named as ROAR, that is more robust toward unbalanced datasets (which are common). Through comprehensive empirical validation using 4 real datasets and 16 benchmark semi-synthetic label ranking datasets, and a case study, we demonstrate the superiorities of our proposals over 2 popular label ranking measures and 6 competing label ranking algorithms. The datasets and implementations used in the empirical validation are available for access.


Mining Rank Data

arXiv.org Machine Learning

The problem of frequent pattern mining has been studied quite extensively for various types of data, including sets, sequences, and graphs. Somewhat surprisingly, another important type of data, namely rank data, has received very little attention in data mining so far. In this paper, we therefore addresses the problem of mining rank data, that is, data in the form of rankings (total orders) of an underlying set of items. More specifically, two types of patterns are considered, namely frequent rankings and dependencies between such rankings in the form of association rules. Algorithms for mining frequent rankings and frequent closed rankings are proposed and tested experimentally, using both synthetic and real data.