AITopics | mtry

Collaborating Authors

mtry

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Targeted tuning of random forests for quantile estimation and prediction intervals

Berkowitz, Matthew, Altman, Rachel MacKay, Loughin, Thomas M.

arXiv.org Machine LearningJul-3-2025

We present a novel tuning procedure for random forests (RFs) that improves the accuracy of estimated quantiles and produces valid, relatively narrow prediction intervals. While RFs are typically used to estimate mean responses (conditional on covariates), they can also be used to estimate quantiles by estimating the full distribution of the response. However, standard approaches for building RFs often result in excessively biased quantile estimates. To reduce this bias, our proposed tuning procedure minimizes "quantile coverage loss" (QCL), which we define as the estimated bias of the marginal quantile coverage probability estimate based on the out-of-bag sample. We adapt QCL tuning to handle censored data and demonstrate its use with random survival forests. We show that QCL tuning results in quantile estimates with more accurate coverage probabilities than those achieved using default parameter values or traditional tuning (using MSPE for uncensored data and C-index for censored data), while also reducing the estimated MSE of these coverage probabilities. We discuss how the superior performance of QCL tuning is linked to its alignment with the estimation goal. Finally, we explore the validity and width of prediction intervals created using this method.

artificial intelligence, machine learning, quantile, (19 more...)

arXiv.org Machine Learning

2507.0143

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Burnaby (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California (0.04)
Europe > Italy > Lazio > Rome (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine (1.00)
Law > Civil Rights & Constitutional Law (0.59)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Review for NeurIPS paper: Joints in Random Forests

Neural Information Processing SystemsJan-26-2025, 06:15:57 GMT

While the approach is presented as a general generative model based on DT and RF, the paper fails to show its practical interest beyond handling missing values at test time. The possibility of using the approach for outlier detection is potentially interesting but the experiment in the paper is restricted to a single dataset and does not include any comparison with competitors except Gaussian KDE. Overall, the properties of GeDT and GeRF as general purpose density estimators are not really studied. My feeling is that because the tree partitioning is unchanged with respect to standard discriminative DT and RF, GeDT and GeRF are probably only appropriate in the context of tasks related to target predictions. In other tasks, I don't see why they would perform better than pure PC models or other methods mentioned in the related work section.

density model, gedt and gerf, neurips paper, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.40)

Add feedback

Randomization Can Reduce Both Bias and Variance: A Case Study in Random Forests

Liu, Brian, Mazumder, Rahul

arXiv.org Machine LearningFeb-19-2024

We study the often overlooked phenomenon, first noted in Breiman (2001), that random forests appear to reduce bias compared to bagging. Motivated by an interesting paper by Mentch and Zhou (2020), where the authors argue that random forests reduce effective degrees of freedom and only outperform bagging ensembles in low signal-to-noise ratio (SNR) settings, we explore how random forests can uncover patterns in the data missed by bagging. We empirically demonstrate that in the presence of such patterns, random forests reduce bias along with variance and increasingly outperform bagging ensembles when SNR is high. Our observations offer insights into the real-world success of random forests across a range of SNRs and enhance our understanding of the difference between random forests and bagging ensembles with respect to the randomization injected into each split. Our investigations also yield practical insights into the importance of tuning mtry in random forests.

ensemble, random forest, snr, (16 more...)

arXiv.org Machine Learning

2402.12668

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Adaptive Split Balancing for Optimal Random Forest

Zhang, Yuqian, Ji, Weijie, Bradic, Jelena

arXiv.org Machine LearningFeb-17-2024

While random forests are commonly used for regression problems, existing methods often lack adaptability in complex situations or lose optimality under simple, smooth scenarios. In this study, we introduce the adaptive split balancing forest (ASBF), capable of learning tree representations from data while simultaneously achieving minimax optimality under the Lipschitz class. To exploit higher-order smoothness levels, we further propose a localized version that attains the minimax rate under the H\"older class $\mathcal{H}^{q,\beta}$ for any $q\in\mathbb{N}$ and $\beta\in(0,1]$. Rather than relying on the widely-used random feature selection, we consider a balanced modification to existing approaches. Our results indicate that an over-reliance on auxiliary randomness may compromise the approximation power of tree models, leading to suboptimal results. Conversely, a less random, more balanced approach demonstrates optimality. Additionally, we establish uniform upper bounds and explore the application of random forests in average treatment effect estimation problems. Through simulation studies and real-data applications, we demonstrate the superior empirical performance of the proposed methods over existing random forests.

adaptive split, random forest, splitting direction, (16 more...)

arXiv.org Machine Learning

2402.11228

Country:

Asia > China (0.14)
South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Effect of hyperparameters on variable selection in random forests

Fouodo, Cesaire J. K., Kronziel, Lea L., König, Inke R., Szymczak, Silke

arXiv.org Machine LearningSep-13-2023

Random forests (RFs) are well suited for prediction modeling and variable selection in high-dimensional omics studies. The effect of hyperparameters of the RF algorithm on prediction performance and variable importance estimation have previously been investigated. However, how hyperparameters impact RF-based variable selection remains unclear. We evaluate the effects on the Vita and the Boruta variable selection procedures based on two simulation studies utilizing theoretical distributions and empirical gene expression data. We assess the ability of the procedures to select important variables (sensitivity) while controlling the false discovery rate (FDR). Our results show that the proportion of splitting candidate variables (mtry.prop) and the sample fraction (sample.fraction) for the training dataset influence the selection procedures more than the drawing strategy of the training datasets and the minimal terminal node size. A suitable setting of the RF hyperparameters depends on the correlation structure in the data. For weakly correlated predictor variables, the default value of mtry is optimal, but smaller values of sample.fraction result in larger sensitivity. In contrast, the difference in sensitivity of the optimal compared to the default value of sample.fraction is negligible for strongly correlated predictor variables, whereas smaller values than the default are better in the other settings. In conclusion, the default values of the hyperparameters will not always be suitable for identifying important variables. Thus, adequate values differ depending on whether the aim of the study is optimizing prediction performance or variable selection.

artificial intelligence, machine learning, predictor variable, (17 more...)

arXiv.org Machine Learning

2309.06943

Country:

Europe > Germany > Schleswig-Holstein (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.94)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)

Add feedback

Random survival forests with multivariate longitudinal endogenous covariates

Devaux, Anthony, Helmer, Catherine, Genuer, Robin, Proust-Lima, Cécile

arXiv.org Machine LearningFeb-9-2023

Predicting the individual risk of a clinical event using the complete patient history is still a major challenge for personalized medicine. Among the methods developed to compute individual dynamic predictions, the joint models have the assets of using all the available information while accounting for dropout. However, they are restricted to a very small number of longitudinal predictors. Our objective was to propose an innovative alternative solution to predict an event probability using a possibly large number of longitudinal predictors. We developed DynForest, an extension of competing-risk random survival forests that handles endogenous longitudinal predictors. At each node of the tree, the time-dependent predictors are translated into time-fixed features (using mixed models) to be used as candidates for splitting the subjects into two subgroups. The individual event probability is estimated in each tree by the Aalen-Johansen estimator of the leaf in which the subject is classified according to his/her history of predictors. The final individual prediction is given by the average of the tree-specific individual event probabilities. We carried out a simulation study to demonstrate the performances of DynForest both in a small dimensional context (in comparison with joint models) and in a large dimensional context (in comparison with a regression calibration method that ignores informative dropout). We also applied DynForest to (i) predict the individual probability of dementia in the elderly according to repeated measures of cognitive, functional, vascular and neuro-degeneration markers, and (ii) quantify the importance of each type of markers for the prediction of dementia. Implemented in the R package DynForest, our methodology provides a novel and appropriate solution for the prediction of events from any number of longitudinal endogenous predictors.

artificial intelligence, machine learning, predictor, (19 more...)

arXiv.org Machine Learning

2208.05801

Country:

North America > United States > New York (0.04)
Europe > France > Occitanie > Hérault > Montpellier (0.04)
Europe > France > Nouvelle-Aquitaine > Gironde > Bordeaux (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology > Dementia (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Random Forests for time-fixed and time-dependent predictors: The DynForest R package

Devaux, Anthony, Proust-Lima, Cécile, Genuer, Robin

arXiv.org Artificial IntelligenceFeb-6-2023

The R package DynForest implements random forests for predicting a categorical or a (multiple causes) time-to-event outcome based on time-fixed and time-dependent predictors. Through the random forests, the time-dependent predictors can be measured with error at subject-specific times, and they can be endogeneous (i.e., impacted by the outcome process). They are modeled internally using flexible linear mixed models (thanks to lcmm package) with time-associations pre-specified by the user. DynForest computes dynamic predictions that take into account all the information from time-fixed and time-dependent predictors. DynForest also provides information about the most predictive variables using variable importance and minimal depth. Variable importance can also be computed on groups of variables. To display the results, several functions are available such as summary and plot functions. This paper aims to guide the user with a step-by-step example of the different functions for fitting random forests within DynForest.

artificial intelligence, machine learning, predictor, (20 more...)

arXiv.org Artificial Intelligence

2302.0267

Country:

Europe > Austria > Vienna (0.14)
Europe > France > Nouvelle-Aquitaine > Gironde > Bordeaux (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Generalizing Gain Penalization for Feature Selection in Tree-based Models

Wundervald, Bruna, Parnell, Andrew, Domijan, Katarina

arXiv.org Machine LearningJun-12-2020

We develop a new approach for feature selection via gain penalization in tree-based models. First, we show that previous methods do not perform sufficient regularization and often exhibit sub-optimal out-of-sample performance, especially when correlated features are present. Instead, we develop a new gain penalization idea that exhibits a general local-global regularization for tree-based models. The new method allows for more flexibility in the choice of feature-specific importance weights. We validate our method on both simulated and real data and implement itas an extension of the popular R package ranger.

artificial intelligence, machine learning, random forest, (18 more...)

arXiv.org Machine Learning

2006.07515

Country:

North America > United States (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Randomization as Regularization: A Degrees of Freedom Explanation for Random Forest Success

Mentch, Lucas, Zhou, Siyu

arXiv.org Machine LearningOct-31-2019

Random forests remain among the most popular off-the-shelf supervised machine learning tools with a well-established track record of predictive accuracy in both regression and classification settings. Despite their empirical success as well as a bevy of recent work investigating their statistical properties, a full and satisfying explanation for their success has yet to be put forth. Here we aim to take a step forward in this direction by demonstrating that the additional randomness injected into individual trees serves as a form of implicit regularization, making random forests an ideal model in low signal-to-noise ratio (SNR) settings. Specifically, from a model-complexity perspective, we show that the mtry parameter in random forests serves much the same purpose as the shrinkage penalty in explicitly regularized regression procedures like lasso and ridge regression. To highlight this point, we design a randomized linear-model-based forward selection procedure intended as an analogue to tree-based random forests and demonstrate its surprisingly strong empirical performance. Numerous demonstrations on both real and synthetic data are provided.

mtry, procedure, random forest, (16 more...)

arXiv.org Machine Learning

1911.0019

Country:

Oceania > Australia > Tasmania (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.81)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.86)

Add feedback

Making Sense of Random Forest Probabilities: a Kernel Perspective

Olson, Matthew A., Wyner, Abraham J.

arXiv.org Machine LearningDec-14-2018

A random forest is a popular tool for estimating probabilities in machine learning classification tasks. However, the means by which this is accomplished is unprincipled: one simply counts the fraction of trees in a forest that vote for a certain class. In this paper, we forge a connection between random forests and kernel regression. This places random forest probability estimation on more sound statistical footing. As part of our investigation, we develop a model for the proximity kernel and relate it to the geometry and sparsity of the estimation problem. We also provide intuition and recommendations for tuning a random forest to improve its probability estimates.

artificial intelligence, machine learning, random forest, (18 more...)

arXiv.org Machine Learning

1812.05792

Country: North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback