A Nonparametric Test of Dependence Based on Ensemble of Decision Trees
A general purpose method to detect statistical dependence, or correlation, between random variables has invaluable uses in a wide array of sciences and applications (Li, 2000; Martínez-Gómez et al., 2014; Mahdi et al., 2012). Linear correlation (Pearson, 1920) is one of the oldest statistical methods that are still widely used today. Though the assumption of linearity is not always realistic, the popularity of such method stems from its ease of computation, simplicity, interpretability, and high power when the assumption of linearity is satisfied. Several approaches have been proposed to quantify correlation, in the general case, for more complex relationships and under less stringent assumptions. Examples of these methods are the kernel based correlation (Hardoon et al., 2004; Chang et al., 2013), copula methods (Poczos et al., 2012), distance correlation (Székely et al., 2007; Székely and Rizzo, 2009), and discretization based mutual information (MI) (Steuer et al., 2002) methods such as the maximal information criterion (MIC) (Reshef et al., 2011). Issues that can be lacking in some of the existing methods include: low statistical power, high computation demand, lack of intuitive interpretability, or lack of a known distribution of the coefficient under independence that would enable computing a statistical confidence. More thorough details on the pros and cons of those methods and others can be found in several studies (de Siqueira Santos et al., 2014; N. Reshef et al., 2018).
Jul-23-2020
- Country:
- North America > United States > New York > New York County > New York City (0.04)
- Genre:
- Research Report (1.00)
- Technology: