Goto

Collaborating Authors

 Wehenkel, Louis


From global to local MDI variable importances for random forests and when they are Shapley values

arXiv.org Machine Learning

Random forests have been widely used for their ability to provide so-called importance measures, which give insight at a global (per dataset) level on the relevance of input variables to predict a certain output. On the other hand, methods based on Shapley values have been introduced to refine the analysis of feature relevance in tree-based models to a local (per instance) level. In this context, we first show that the global Mean Decrease of Impurity (MDI) variable importance scores correspond to Shapley values under some conditions. Then, we derive a local MDI importance measure of variable relevance, which has a very natural connection with the global MDI measure and can be related to a new notion of local feature relevance. We further link local MDI importances with Shapley values and discuss them in the light of related measures from the literature. The measures are illustrated through experiments on several classification and regression problems.


Understanding variable importances in forests of randomized trees

Neural Information Processing Systems

Despite growing interest and practical use in various scientific areas, variable importances derived from tree-based ensemble methods are not well understood from a theoretical point of view. In this work we characterize the Mean Decrease Impurity (MDI) variable importances as measured by an ensemble of totally randomized trees in asymptotic sample and ensemble size conditions. We derive a three-level decomposition of the information jointly provided by all input variables about the output in terms of i) the MDI importance of each input variable, ii) the degree of interaction of a given input variable with the other input variables, iii) the different interaction terms of a given degree. We then show that this MDI importance of a variable is equal to zero if and only if the variable is irrelevant and that the MDI importance of a relevant variable is invariant with respect to the removal or the addition of irrelevant variables. We illustrate these properties on a simple example and discuss how they may change in the case of non-totally randomized trees such as Random Forests and Extra-Trees.


Gradient tree boosting with random output projections for multi-label classification and multi-output regression

arXiv.org Machine Learning

Multi-output supervised learning aims to model input-output relationships from observations of inputoutput pairs whenever the output space is a vector of random variables. Multi-output classification and regression tasks have numerous applications in domains ranging from biology to multimedia, and recent applications in this area correspond to very high dimensional output spaces (Agrawal et al, 2013; Dekel and Shamir, 2010). Classification and regression trees (Breiman et al, 1984) are popular supervised learning methods that provide state-of-the-art performance when exploited in the context of ensemble methods, namely Random forests (Breiman, 2001; Geurts et al, 2006) and Boosting (Freund and Schapire, 1997; Friedman, 2001). Classification and regression trees can obviously be exploited to handle multi-output problems. The most straightforward way to address multi-output tasks is to apply standard single output methods separately and independently on each output. Although simple, this method, called binary relevance (Tsoumakas et al, 2009) in multi-label classification or single target (Spyromitros-Xioufis et al, 2012) in multi-output regression is often suboptimal as it does not exploit potential correlations that might exist between the outputs. Tree ensemble methods have however been explicitely extended by several authors to the joint prediction of multiple outputs (e.g., Segal, 1992; Blockeel et al, 2000). These extensions build a single tree to predict all outputs at once. They adapt the score measure used to assess splits during the tree growth to take into account all outputs and label each tree leaf with a vector of values, one for each output.


Unit Commitment using Nearest Neighbor as a Short-Term Proxy

arXiv.org Artificial Intelligence

We devise the Unit Commitment Nearest Neighbor (UCNN) algorithm to be used as a proxy for quickly approximating outcomes of short-term decisions, to make tractable hierarchical long-term assessment and planning for large power systems. Experimental results on an updated versions of IEEE-RTS79 and IEEE-RTS96 show high accuracy measured on operational cost, achieved in run-times that are lower in several orders of magnitude than the traditional approach.


Random Subspace with Trees for Feature Selection Under Memory Constraints

arXiv.org Machine Learning

Dealing with datasets of very high dimension is a major challenge in machine learning. In this paper, we consider the problem of feature selection in applications where the memory is not large enough to contain all features. In this setting, we propose a novel tree-based feature selection approach that builds a sequence of randomized trees on small subsamples of variables mixing both variables already identified as relevant by previous models and variables randomly selected among the other variables. As our main contribution, we provide an in-depth theoretical analysis of this method in infinite sample setting. In particular, we study its soundness with respect to common definitions of feature relevance and its convergence speed under various variable dependance scenarios. We also provide some preliminary empirical results highlighting the potential of the approach.


Context-dependent feature analysis with random forests

arXiv.org Machine Learning

In many cases, feature selection is often more complicated than identifying a single subset of input variables that would together explain the output. There may be interactions that depend on contextual information, i.e., variables that reveal to be relevant only in some specific circumstances. In this setting, the contribution of this paper is to extend the random forest variable importances framework in order (i) to identify variables whose relevance is context-dependent and (ii) to characterize as precisely as possible the effect of contextual information on these variables. The usage and the relevance of our framework for highlighting context-dependent variables is illustrated on both artificial and real datasets.


Random forests with random projections of the output space for high dimensional multi-label classification

arXiv.org Machine Learning

We adapt the idea of random projections applied to the output space, so as to enhance tree-based ensemble methods in the context of multi-label classification. We show how learning time complexity can be reduced without affecting computational complexity and accuracy of predictions. We also show that random output space projections may be used in order to reach different bias-variance tradeoffs, over a broad panel of benchmark problems, and that this may lead to improved accuracy while reducing significantly the computational burden of the learning stage.


Classifying pairs with trees for supervised biological network inference

arXiv.org Machine Learning

Networks are ubiquitous in biology and computational approaches have been largely investigated for their inference. In particular, supervised machine learning methods can be used to complete a partially known network by integrating various measurements. Two main supervised frameworks have been proposed: the local approach, which trains a separate model for each network node, and the global approach, which trains a single model over pairs of nodes. Here, we systematically investigate, theoretically and empirically, the exploitation of tree-based ensemble methods in the context of these two approaches for biological network inference. We first formalize the problem of network inference as classification of pairs, unifying in the process homogeneous and bipartite graphs and discussing two main sampling schemes. We then present the global and the local approaches, extending the later for the prediction of interactions between two unseen network nodes, and discuss their specializations to tree-based ensemble methods, highlighting their interpretability and drawing links with clustering techniques. Extensive computational experiments are carried out with these methods on various biological networks that clearly highlight that these methods are competitive with existing methods.


Understanding variable importances in forests of randomized trees

Neural Information Processing Systems

Despite growing interest and practical use in various scientific areas, variable importances derived from tree-based ensemble methods are not well understood from a theoretical point of view. In this work we characterize the Mean Decrease Impurity (MDI) variable importances as measured by an ensemble of totally randomized trees in asymptotic sample and ensemble size conditions. We derive a three-level decomposition of the information jointly provided by all input variables about the output in terms of i) the MDI importance of each input variable, ii) the degree of interaction of a given input variable with the other input variables, iii) the different interaction terms of a given degree. We then show that this MDI importance of a variable is equal to zero if and only if the variable is irrelevant and that the MDI importance of a relevant variable is invariant with respect to the removal or the addition of irrelevant variables. We illustrate these properties on a simple example and discuss how they may change in the case of non-totally randomized trees such as Random Forests and Extra-Trees.


On the Construction of the Inclusion Boundary Neighbourhood for Markov Equivalence Classes of Bayesian Network Structures

arXiv.org Artificial Intelligence

The problem of learning Markov equivalence classes of Bayesian network structures may be solved by searching for the maximum of a scoring metric in a space of these classes. This paper deals with the definition and analysis of one such search space. We use a theoretically motivated neighbourhood, the inclusion boundary, and represent equivalence classes by essential graphs. We show that this search space is connected and that the score of the neighbours can be evaluated incrementally. We devise a practical way of building this neighbourhood for an essential graph that is purely graphical and does not explicitely refer to the underlying independences. We find that its size can be intractable, depending on the complexity of the essential graph of the equivalence class. The emphasis is put on the potential use of this space with greedy hill -climbing search