Goto

Collaborating Authors

 relevant variable


Feature selection in functional data classification with recursive maxima hunting

Neural Information Processing Systems

Dimensionality reduction is one of the key issues in the design of effective machine learning methods for automatic induction. In this work, we introduce recursive maxima hunting (RMH) for variable selection in classification problems with functional data. In this context, variable selection techniques are especially attractive because they reduce the dimensionality, facilitate the interpretation and can improve the accuracy of the predictive models. The method, which is a recursive extension of maxima hunting (MH), performs variable selection by identifying the maxima of a relevance function, which measures the strength of the correlation of the predictor functional variable with the class label. At each stage, the information associated with the selected variable is removed by subtracting the conditional expectation of the process. The results of an extensive empirical evaluation are used to illustrate that, in the problems investigated, RMH has comparable or higher predictive accuracy than standard dimensionality reduction techniques, such as PCA and PLS, and state-of-the-art feature selection methods for functional data, such as maxima hunting.


Estimation in high-dimensional linear regression: Post-Double-Autometrics as an alternative to Post-Double-Lasso

arXiv.org Machine Learning

Post-Double-Lasso is becoming the most popular method for estimating linear regression models with many covariates when the purpose is to obtain an accurate estimate of a parameter of interest, such as an average treatment effect. However, this method can suffer from substantial omitted variable bias in finite sample. We propose a new method called Post-Double-Autometrics, which is based on Autometrics, and show that this method outperforms Post-Double-Lasso.


A Stable Lasso

arXiv.org Machine Learning

The Lasso has been widely used as a method for variable selection, valued for its simplicity and empirical performance. However, Lasso's selection stability deteriorates in the presence of correlated predictors. Several approaches have been developed to mitigate this limitation. In this paper, we provide a brief review of existing approaches, highlighting their limitations. We then propose a simple technique to improve the selection stability of Lasso by integrating a weighting scheme into the Lasso penalty function, where the weights are defined as an increasing function of a correlation-adjusted ranking that reflects the predictive power of predictors. Empirical evaluations on both simulated and real-world datasets demonstrate the efficacy of the proposed method. Additional numerical results demonstrate the effectiveness of the proposed approach in stabilizing other regularization-based selection methods, indicating its potential as a general-purpose solution.


Variational Garrote for Statistical Physics-based Sparse and Robust Variable Selection

arXiv.org Artificial Intelligence

Identifying relationships between variables is a fundamental task in science. Among various approaches, linear regression plays a central role in linking explanatory variables to dependent variables in statistical modeling [1, 2]. Linear regression is useful in physics [3, 4] for extracting equations of motion from time series data [5] and for predicting trends in dynamical systems [6], but its simplicity, interpretability, and predictive power make it a cornerstone of data analysis [7], forecasting [8], and decision-making [9] in many fields. Moreover, linear regression forms the foundation for many advanced statistical and machine learning models [10], including logistic regression [11], support vector machines [12], and generalized linear models [13]. Extensions of linear regression often aim to capture more complex relationships by introducing higher-order polynomial terms or additional nonlinear transformations. Modern developments in machine learning have enabled the training of deep and highly overparameterized models capable of modeling intricate patterns far beyond the reach of simple linear approaches. In particular, deep learning models can be interpreted as sophisticated forms of nonlinear regression [14], capable of approximating complex functions with high flexibility. Despite its utility, linear regression struggles with modern high-dimensional datasets where only a small subset of variables is truly informative.


Learning Juntas under Markov Random Fields

arXiv.org Artificial Intelligence

We give an algorithm for learning $O(\log n)$ juntas in polynomial-time with respect to Markov Random Fields (MRFs) in a smoothed analysis framework where only the external field has been randomly perturbed. This is a broad generalization of the work of Kalai and Teng, who gave an algorithm that succeeded with respect to smoothed product distributions (i.e., MRFs whose dependency graph has no edges). Our algorithm has two phases: (1) an unsupervised structure learning phase and (2) a greedy supervised learning algorithm. This is the first example where algorithms for learning the structure of an undirected graphical model lead to provably efficient algorithms for supervised learning.


A Unified Framework for Variable Selection in Model-Based Clustering with Missing Not at Random

arXiv.org Machine Learning

Model-based clustering integrated with variable selection is a powerful tool for uncovering latent structures within complex data. However, its effectiveness is often hindered by challenges such as identifying relevant variables that define heterogeneous subgroups and handling data that are missing not at random, a prevalent issue in fields like transcriptomics. While several notable methods have been proposed to address these problems, they typically tackle each issue in isolation, thereby limiting their flexibility and adaptability. This paper introduces a unified framework designed to address these challenges simultaneously. Our approach incorporates a data-driven penalty matrix into penalized clustering to enable more flexible variable selection, along with a mechanism that explicitly models the relationship between missingness and latent class membership. We demonstrate that, under certain regularity conditions, the proposed framework achieves both asymptotic consistency and selection consistency, even in the presence of missing data. This unified strategy significantly enhances the capability and efficiency of model-based clustering, advancing methodologies for identifying informative variables that define homogeneous subgroups in the presence of complex missing data patterns. The performance of the framework, including its computational efficiency, is evaluated through simulations and demonstrated using both synthetic and real-world transcriptomic datasets.


Speeding up approximate MAP by applying domain knowledge about relevant variables

arXiv.org Artificial Intelligence

The MAP problem in Bayesian networks is notoriously intractable, even when approximated. In an earlier paper we introduced the Most Frugal Explanation heuristic approach to solving MAP, by partitioning the set of intermediate variables (neither observed nor part of the MAP variables) into a set of relevant variables, which are marginalized out, and irrelevant variables, which will be assigned a sampled value from their domain. In this study we explore whether knowledge about which variables are relevant for a particular query (i.e., domain knowledge) speeds up computation sufficiently to beat both exact MAP as well as approximate MAP while giving reasonably accurate results. Our results are inconclusive, but also show that this probably depends on the specifics of the MAP query, most prominently the number of MAP variables.


Approximating the Number of Relevant Variables in a Parity Implies Proper Learning

arXiv.org Artificial Intelligence

Consider the model where we can access a parity function through random uniform labeled examples in the presence of random classification noise. In this paper, we show that approximating the number of relevant variables in the parity function is as hard as properly learning parities. More specifically, let $\gamma:{\mathbb R}^+\to {\mathbb R}^+$, where $\gamma(x) \ge x$, be any strictly increasing function. In our first result, we show that from any polynomial-time algorithm that returns a $\gamma$-approximation, $D$ (i.e., $\gamma^{-1}(d(f)) \leq D \leq \gamma(d(f))$), of the number of relevant variables~$d(f)$ for any parity $f$, we can, in polynomial time, construct a solution to the long-standing open problem of polynomial-time learning $k(n)$-sparse parities (parities with $k(n)\le n$ relevant variables), where $k(n) = \omega_n(1)$. In our second result, we show that from any $T(n)$-time algorithm that, for any parity $f$, returns a $\gamma$-approximation of the number of relevant variables $d(f)$ of $f$, we can, in polynomial time, construct a $poly(\Gamma(n))T(\Gamma(n)^2)$-time algorithm that properly learns parities, where $\Gamma(x)=\gamma(\gamma(x))$. If $T(\Gamma(n)^2)=\exp({o(n/\log n)})$, this would resolve another long-standing open problem of properly learning parities in the presence of random classification noise in time $\exp({o(n/\log n)})$.


Learning Causal Abstractions of Linear Structural Causal Models

arXiv.org Artificial Intelligence

The need for modelling causal knowledge at different levels of granularity arises in several settings. Causal Abstraction provides a framework for formalizing this problem by relating two Structural Causal Models at different levels of detail. Despite increasing interest in applying causal abstraction, e.g. in the interpretability of large machine learning models, the graphical and parametrical conditions under which a causal model can abstract another are not known. Furthermore, learning causal abstractions from data is still an open problem. In this work, we tackle both issues for linear causal models with linear abstraction functions. First, we characterize how the low-level coefficients and the abstraction function determine the high-level coefficients and how the high-level model constrains the causal ordering of low-level variables. Then, we apply our theoretical results to learn high-level and low-level causal models and their abstraction function from observational data. In particular, we introduce Abs-LiNGAM, a method that leverages the constraints induced by the learned high-level model and the abstraction function to speedup the recovery of the larger low-level model, under the assumption of non-Gaussian noise terms. In simulated settings, we show the effectiveness of learning causal abstractions from data and the potential of our method in improving scalability of causal discovery.


e3796ae838835da0b6f6ea37bcf8bcb7-Reviews.html

Neural Information Processing Systems

Variable importance measures are often used to highlight key predictor variables in tree-based ensemble models. However, there has generally been a lack of a theoretical understanding of these measures. In this paper, the authors study the theoretical properties of the Mean Decrease Impurity (MDI) importance measures (such as the Gini importance). Through most of the paper they use the Shanon entropy as the impurity measure but show that the theorems and results are applicable to any impurity measure. They begin with an asymptotic analysis of totally randomized fully-developed tree ensembles learned using an infinitely large ensemble.