cart
Sparse Learning with CART
Decision trees with binary splits are popularly constructed using Classification and Regression Trees (CART) methodology. For regression models, this approach recursively divides the data into two near-homogenous daughter nodes according to a split point that maximizes the reduction in sum of squares error (the impurity) along a particular variable. This paper aims to study the statistical properties of regression trees constructed with CART. In doing so, we find that the training error is governed by the Pearson correlation between the optimal decision stump and response data in each node, which we bound by constructing a prior distribution on the split points and solving a nonlinear optimization problem. We leverage this connection between the training error and Pearson correlation to show that CART with cost-complexity pruning achieves an optimal complexity/goodness-of-fit tradeoff when the depth scales with the logarithm of the sample size. Data dependent quantities, which adapt to the dimensionality and latent structure of the regression model, are seen to govern the rates of convergence of the prediction error.
Handling Missing Data in Probabilistic Regression Trees: Methods and Implementation in R
Prass, Taiane Schaedler, Neimaier, Alisson Silva, Pumi, Guilherme
Probabilistic Regression Trees (PRTrees) generalize traditional decision trees by incorporating probability functions that associate each data point with different regions of the tree, providing smooth decisions and continuous responses. This paper introduces an adaptation of PRTrees capable of handling missing values in covariates through three distinct approaches: (i) a uniform probability method, (ii) a partial observation approach, and (iii) a dimension-reduced smoothing technique. The proposed methods preserve the interpretability properties of PRTrees while extending their applicability to incomplete datasets. Simulation studies under MCAR conditions demonstrate the relative performance of each approach, including comparisons with traditional regression trees on smooth function estimation tasks. The proposed methods, together with the original version, have been developed in R with highly optimized routines and are distributed in the PRTree package, publicly available on CRAN. In this paper we also present and discuss the main functionalities of the PRTree package, providing researchers and practitioners with new tools for incomplete data analysis.
we introduce task selection based on prior experience into a meta-learning algorithm by conceptualizing the learner and
We highly appreciate the reviewers' time, efforts, and valuable suggestions! R3, R4 asked for further clarification on the differences between existing work and our approach. P AML and ACL can be seen as complimentary approaches, e.g., P AML might be used to R1 also mentions that only one of the environments is learned from pixel data. Lastly, we will add an analysis of the settings fully observed 4.1 and pixel-descriptor 4.4. With space constraints in mind and since our work's goal is to incorporate active ML approach used in this work in Section 2. Control signals.
Maximize margins for robust splicing detection
de Kergunic, Julien Simon, Abecidan, Rony, Bas, Patrick, Itier, Vincent
Despite recent progress in splicing detection, deep learning-based forensic tools remain difficult to deploy in practice due to their high sensitivity to training conditions. Even mild post-processing applied to evaluation images can significantly degrade detector performance, raising concerns about their reliability in operational contexts. In this work, we show that the same deep architecture can react very differently to unseen post-processing depending on the learned weights, despite achieving similar accuracy on in-distribution test data. This variability stems from differences in the latent spaces induced by training, which affect how samples are separated internally. Our experiments reveal a strong correlation between the distribution of latent margins and a detector's ability to generalize to post-processed images. Based on this observation, we propose a practical strategy for building more robust detectors: train several variants of the same model under different conditions, and select the one that maximizes latent margins.
- North America > United States > Maine (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
FIGS: Attaining XGBoost-level performance with the interpretability and speed of CART
Recent machine-learning advances have led to increasingly complex predictive models, often at the cost of interpretability. We often need interpretability, particularly in high-stakes applications such as in clinical decision-making; interpretable models help with all kinds of things, such as identifying errors, leveraging domain knowledge, and making speedy predictions. In this blog post we'll cover FIGS, a new method for fitting an interpretable model that takes the form of a sum of trees. Real-world experiments and theoretical results show that FIGS can effectively adapt to a wide range of structure in data, achieving state-of-the-art performance in several settings, all without sacrificing interpretability. Intuitively, FIGS works by extending CART, a typical greedy algorithm for growing a decision tree, to consider growing a sum of trees simultaneously (see Fig 1).
EBay Uses Machine Learning to Refine Promoted Listings
Online marketplace eBay incorporated additional buying signals such as "Add to Watchlist," "Make Offer," and "Add to Cart" into its machine learning model to improve the relevance of recommended ad listings, based on the initial items searched for. Chen Xue goes into great detail in this recent article. EBay's Promoted Listings Standard (PLS) is a paid option for sellers. With one option, PLSIM, eBay's recommendation engines suggest sponsored items similar to something a potential buyer just clicked on. The PLSIM is paid on a CPA model (the seller pays eBay only when a sale is made) so that can be very motivating in terms of creating the most effective model to promote the best listings.
- Information Technology > Services (1.00)
- Consumer Products & Services (1.00)
Hierarchical Shrinkage: improving the accuracy and interpretability of tree-based methods
Agarwal, Abhineet, Tan, Yan Shuo, Ronen, Omer, Singh, Chandan, Yu, Bin
Tree-based models such as decision trees and random forests (RF) are a cornerstone of modern machine-learning practice. To mitigate overfitting, trees are typically regularized by a variety of techniques that modify their structure (e.g. pruning). We introduce Hierarchical Shrinkage (HS), a post-hoc algorithm that does not modify the tree structure, and instead regularizes the tree by shrinking the prediction over each node towards the sample means of its ancestors. The amount of shrinkage is controlled by a single regularization parameter and the number of data points in each ancestor. Since HS is a post-hoc method, it is extremely fast, compatible with any tree growing algorithm, and can be used synergistically with other regularization techniques. Extensive experiments over a wide variety of real-world datasets show that HS substantially increases the predictive performance of decision trees, even when used in conjunction with other regularization techniques. Moreover, we find that applying HS to each tree in an RF often improves accuracy, as well as its interpretability by simplifying and stabilizing its decision boundaries and SHAP values. We further explain the success of HS in improving prediction performance by showing its equivalence to ridge regression on a (supervised) basis constructed of decision stumps associated with the internal nodes of a tree. All code and models are released in a full-fledged package available on Github (github.com/csinva/imodels)
- North America > United States > California > Alameda County > Berkeley (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States > New York (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
13 A Comparative Study of Classification Algorithms: Statistical, Machine Learning and Neural Network R. D. King R. Henery
The aim of the Stat Log project is to compare the performance of statistical, machine learning, and neural network algorithms, on large real world problems. This paper describes the completed work on classification in the StatLog project. Classification is here defined to be the problem, given a set of multivariate data with assigned classes, of estimating the probability from a set of attributes describing a new example sampled from the same source that it has a pre-defined class. We gathered together a representative collection of algorithms from statistics (Naive Bayes, K-nearest Neighbour, Kernel density, Linear discriminant, Quadratic discriminant, Logistic regression, Projection pursuit, Bayesian networks), machine learning (CART, C4.5, NewID, AC2, CAL5, CN2, ITrule -- only propositional symbolic algorithms were considered), and neural networks (Backpropagation, Radial basis functions, Kohonen).
- North America > United States > California (0.46)
- Europe > United Kingdom > Scotland (0.28)
- Health & Medicine (1.00)
- Government > Regional Government > > > > > > > North America Government (0.68)
- Government > Regional Government > North America Government > United States Government (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- (2 more...)