Goto

Collaborating Authors

 cart


Sparse Learning with CART

Neural Information Processing Systems

Decision trees with binary splits are popularly constructed using Classification and Regression Trees (CART) methodology. For regression models, this approach recursively divides the data into two near-homogenous daughter nodes according to a split point that maximizes the reduction in sum of squares error (the impurity) along a particular variable. This paper aims to study the statistical properties of regression trees constructed with CART. In doing so, we find that the training error is governed by the Pearson correlation between the optimal decision stump and response data in each node, which we bound by constructing a prior distribution on the split points and solving a nonlinear optimization problem. We leverage this connection between the training error and Pearson correlation to show that CART with cost-complexity pruning achieves an optimal complexity/goodness-of-fit tradeoff when the depth scales with the logarithm of the sample size. Data dependent quantities, which adapt to the dimensionality and latent structure of the regression model, are seen to govern the rates of convergence of the prediction error.


Handling Missing Data in Probabilistic Regression Trees: Methods and Implementation in R

Prass, Taiane Schaedler, Neimaier, Alisson Silva, Pumi, Guilherme

arXiv.org Machine Learning

Probabilistic Regression Trees (PRTrees) generalize traditional decision trees by incorporating probability functions that associate each data point with different regions of the tree, providing smooth decisions and continuous responses. This paper introduces an adaptation of PRTrees capable of handling missing values in covariates through three distinct approaches: (i) a uniform probability method, (ii) a partial observation approach, and (iii) a dimension-reduced smoothing technique. The proposed methods preserve the interpretability properties of PRTrees while extending their applicability to incomplete datasets. Simulation studies under MCAR conditions demonstrate the relative performance of each approach, including comparisons with traditional regression trees on smooth function estimation tasks. The proposed methods, together with the original version, have been developed in R with highly optimized routines and are distributed in the PRTree package, publicly available on CRAN. In this paper we also present and discuss the main functionalities of the PRTree package, providing researchers and practitioners with new tools for incomplete data analysis.


we introduce task selection based on prior experience into a meta-learning algorithm by conceptualizing the learner and

Neural Information Processing Systems

We highly appreciate the reviewers' time, efforts, and valuable suggestions! R3, R4 asked for further clarification on the differences between existing work and our approach. P AML and ACL can be seen as complimentary approaches, e.g., P AML might be used to R1 also mentions that only one of the environments is learned from pixel data. Lastly, we will add an analysis of the settings fully observed 4.1 and pixel-descriptor 4.4. With space constraints in mind and since our work's goal is to incorporate active ML approach used in this work in Section 2. Control signals.


Maximize margins for robust splicing detection

de Kergunic, Julien Simon, Abecidan, Rony, Bas, Patrick, Itier, Vincent

arXiv.org Artificial Intelligence

Despite recent progress in splicing detection, deep learning-based forensic tools remain difficult to deploy in practice due to their high sensitivity to training conditions. Even mild post-processing applied to evaluation images can significantly degrade detector performance, raising concerns about their reliability in operational contexts. In this work, we show that the same deep architecture can react very differently to unseen post-processing depending on the learned weights, despite achieving similar accuracy on in-distribution test data. This variability stems from differences in the latent spaces induced by training, which affect how samples are separated internally. Our experiments reveal a strong correlation between the distribution of latent margins and a detector's ability to generalize to post-processed images. Based on this observation, we propose a practical strategy for building more robust detectors: train several variants of the same model under different conditions, and select the one that maximizes latent margins.


13 A Comparative Study of Classification Algorithms: Statistical, Machine Learning and Neural Network R. D. King R. Henery

AI Classics

The aim of the Stat Log project is to compare the performance of statistical, machine learning, and neural network algorithms, on large real world problems. This paper describes the completed work on classification in the StatLog project. Classification is here defined to be the problem, given a set of multivariate data with assigned classes, of estimating the probability from a set of attributes describing a new example sampled from the same source that it has a pre-defined class. We gathered together a representative collection of algorithms from statistics (Naive Bayes, K-nearest Neighbour, Kernel density, Linear discriminant, Quadratic discriminant, Logistic regression, Projection pursuit, Bayesian networks), machine learning (CART, C4.5, NewID, AC2, CAL5, CN2, ITrule -- only propositional symbolic algorithms were considered), and neural networks (Backpropagation, Radial basis functions, Kohonen).