Goto

Collaborating Authors

 Decision Tree Learning


How Bayesian additive regression trees(BART) are used part3(Machine Learning)

#artificialintelligence

Abstract: Using ensemble methods for regression has been a large success in obtaining high-accuracy prediction. Examples are Bagging, Random forest, Boosting, BART (Bayesian additive regression tree), and their variants. In this paper, we propose a new perspective named variable grouping to enhance the predictive performance. The main idea is to seek for potential grouping of variables in such way that there is no nonlinear interaction term between variables of different groups. Given a sum-of-learner model, each learner will only be responsible for one group of variables, which would be more efficient in modeling nonlinear interactions.


BELLATREX: Building Explanations through a LocaLly AccuraTe Rule EXtractor

arXiv.org Artificial Intelligence

Tree-ensemble algorithms, such as random forest, are effective machine learning methods popular for their flexibility, high performance, and robustness to overfitting. However, since multiple learners are combined, they are not as interpretable as a single decision tree. In this work we propose a novel method that is Building Explanations through a LocalLy AccuraTe Rule EXtractor (Bellatrex), and is able to explain the forest prediction for a given test instance with only a few diverse rules. Starting from the decision trees generated by a random forest, our method 1) pre-selects a subset of the rules used to make the prediction, 2) creates a vector representation of such rules, 3) projects them to a low-dimensional space, 4) clusters such representations to pick a rule from each cluster to explain the instance prediction. We test the effectiveness of Bellatrex on 89 real-world datasets and we demonstrate the validity of our method for binary classification, regression, multi-label classification and time-to-event tasks. To the best of our knowledge, it is the first time that an interpretability toolbox can handle all these tasks within the same framework. We also show that our extracted surrogate model can approximate the performance of the corresponding ensemble model in all considered tasks, while selecting only few trees from the whole forest. We also show that our proposed approach substantially outperforms other explainable methods in terms of predictive performance.


Random forests, sound symbolism and Pokemon evolution

arXiv.org Artificial Intelligence

This study constructs machine learning algorithms that are trained to classify samples using sound symbolism, and then it reports on an experiment designed to measure their understanding against human participants. Random forests are trained using the names of Pokemon, which are fictional video game characters, and their evolutionary status. Pokemon undergo evolution when certain in-game conditions are met. Evolution changes the appearance, abilities, and names of Pokemon. In the first experiment, we train three random forests using the sounds that make up the names of Japanese, Chinese, and Korean Pokemon to classify Pokemon into pre-evolution and post-evolution categories. We then train a fourth random forest using the results of an elicitation experiment whereby Japanese participants named previously unseen Pokemon. In Experiment 2, we reproduce those random forests with name length as a feature and compare the performance of the random forests against humans in a classification experiment whereby Japanese participants classified the names elicited in Experiment 1 into pre-and post-evolution categories. Experiment 2 reveals an issue pertaining to overfitting in Experiment 1 which we resolve using a novel cross-validation method. The results show that the random forests are efficient learners of systematic sound-meaning correspondence patterns and can classify samples with greater accuracy than the human participants.


Tutorial -- SimpleML for Sheets documentation

#artificialintelligence

It is sometimes interesting to understand what is inside a model. Under What do you want to do? select Understand a model. Under Models select the model you just trained called "My Model". Check the box Include sheet data. In the Summary tab, you can see information about the input features of the model.


Chains of Autoreplicative Random Forests for missing value imputation in high-dimensional datasets

arXiv.org Artificial Intelligence

Missing values are a common problem in data science and machine learning. Removing instances with missing values can adversely affect the quality of further data analysis. This is exacerbated when there are relatively many more features than instances, and thus the proportion of affected instances is high. Such a scenario is common in many important domains, for example, single nucleotide polymorphism (SNP) datasets provide a large number of features over a genome for a relatively small number of individuals. To preserve as much information as possible prior to modeling, a rigorous imputation scheme is acutely needed. While Denoising Autoencoders is a state-of-the-art method for imputation in high-dimensional data, they still require enough complete cases to be trained on which is often not available in real-world problems. In this paper, we consider missing value imputation as a multi-label classification problem and propose Chains of Autoreplicative Random Forests. Using multi-label Random Forests instead of neural networks works well for low-sampled data as there are fewer parameters to optimize. Experiments on several SNP datasets show that our algorithm effectively imputes missing values based only on information from the dataset and exhibits better performance than standard algorithms that do not require any additional information. In this paper, the algorithm is implemented specifically for SNP data, but it can easily be adapted for other cases of missing value imputation.


OF-AE: Oblique Forest AutoEncoders

arXiv.org Artificial Intelligence

The usage (briefly CART) [2] have proven to be very successful of the clustering method of the ERCForest can be observed methods for various data analysis problems. The original in the unsupervised algorithm RandomTreesEmbedding from CART algorithm partitions the feature space using axisparallel SKLearn, where the data points are clustered according to splits. The training of a classical decision tree T which leaf they fall in. Furthermore, it is worth noticing that relies on greedy optimization, i.e. the root of the tree is the ERCForest is eventually related to Clustering Trees (CT) the whole input space X which is split into two disjoint introduced in [9] that are Decision Trees able to find natural regions, and this process continues in a recursive manner.


Internet of Things: Digital Footprints Carry A Device Identity

arXiv.org Artificial Intelligence

The usage of technologically advanced devices has seen a boom in many domains, including education, automation, and healthcare; with most of the services requiring Internet-connectivity. To secure a network, device identification plays key role. In this paper, a device fingerprinting (DFP) model, which is able to distinguish between Internet of Things (IoT) and non-IoT devices, as well as uniquely identify individual devices, has been proposed. Four statistical features have been extracted from the consecutive five device-originated packets, to generate individual device fingerprints. The method has been evaluated using the Random Forest (RF) classifier and different datasets. Experimental results have shown that the proposed method achieves up to 99.8% accuracy in distinguishing between IoT and non-IoT devices and over 97.6% in classifying individual devices. These signify that the proposed method is useful in assisting operators in making their networks more secure and robust to security breaches and unauthorised access.


Tree ensemble kernels for Bayesian optimization with known constraints over mixed-feature spaces

arXiv.org Artificial Intelligence

Tree ensembles can be well-suited for black-box optimization tasks such as algorithm tuning and neural architecture search, as they achieve good predictive performance with little or no manual tuning, naturally handle discrete feature spaces, and are relatively insensitive to outliers in the training data. Two well-known challenges in using tree ensembles for black-box optimization are (i) effectively quantifying model uncertainty for exploration and (ii) optimizing over the piece-wise constant acquisition function. To address both points simultaneously, we propose using the kernel interpretation of tree ensembles as a Gaussian Process prior to obtain model variance estimates, and we develop a compatible optimization formulation for the acquisition function. The latter further allows us to seamlessly integrate known constraints to improve sampling efficiency by considering domain-knowledge in engineering settings and modeling search space symmetries, e.g., hierarchical relationships in neural architecture search. Our framework performs as well as state-of-the-art methods for unconstrained black-box optimization over continuous/discrete features and outperforms competing methods for problems combining mixed-variable feature spaces and known input constraints.


Pinaki Laskar on LinkedIn: #ai #machinelearning #programming #aidevelopment

#artificialintelligence

What is the smartest artificial intelligence ever created? All today's AI is not True AI, be it virtual assistants or autonomous vehicles or predictive applications or large language models or search engines or recommendation systems or language translators or facial recognition systems or q/a systems or gamers. AI has not reached even a proof of concept demonstration phase to verify that its models, concepts or theories have the potential for real-world applications, as the evidence demonstrating that AI projects/products are feasible. Real AI is not some infrastructure (ML platform, algorithms, data, compute) and development stack (from libraries to languages, IDE, workflow and visualisation): Some applied maths, probability theory and statistics; Some statistical learning algorithms, logic regression, linear regression, decision trees and random forests; Machine learning algorithms, supervised, unsupervised and reinforced; ANNs, DL algorithms and models, filtering the input data through many layers to predict and classify information; Optimizing (compressing and quantizing) trained neural network models; Some statistical patterns and inferences; Some programming languages, as Python and R., with their libraries and packages; ML platforms, frameworks and runtimes such as PyTorch, ONNX, Apache MXNet, TensorFlow, Caffe2, CNTK, SciKit-Learn, and Keras; Inferencing SDKs like the Qualcomm Neural Processing SDK, integrated development environments (IDE), such as PyCharm, Microsoft VS Code, Jupyter, MATLAB, etc.; Physical servers, virtual machines, containers, specialized hardware such as GPUs, cloud-based computational resources including VMs, containers, and Serverless computing. Today's AI is so-called "Narrow AI" which is designed to perform a single task, and any knowledge gained from performing that task will not automatically be applied to other tasks.


On the utility of feature selection in building two-tier decision trees

arXiv.org Artificial Intelligence

Nowadays, feature selection is frequently used in machine learning when there is a risk of performance degradation due to overfitting or when computational resources are limited. During the feature selection process, the subset of features that are most relevant and least redundant is chosen. In recent years, it has become clear that, in addition to relevance and redundancy, features' complementarity must be considered. Informally, if the features are weak predictors of the target variable separately and strong predictors when combined, then they are complementary. It is demonstrated in this paper that the synergistic effect of complementary features mutually amplifying each other in the construction of two-tier decision trees can be interfered with by another feature, resulting in a decrease in performance. It is demonstrated using cross-validation on both synthetic and real datasets, regression and classification, that removing or eliminating the interfering feature can improve performance by up to 24 times. It has also been discovered that the lesser the domain is learned, the greater the increase in performance. More formally, it is demonstrated that there is a statistically significant negative rank correlation between performance on the dataset prior to the elimination of the interfering feature and performance growth after the elimination of the interfering feature. It is concluded that this broadens the scope of feature selection methods for cases where data and computational resources are sufficient.