AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

MC2: Secure Collaborative Analytics for Machine Learning

#artificialintelligenceOct-28-2022, 17:00:56 GMT

Machine Learning (ML) has gained prominence in recent years because of its ability to be applied across scores of industries and solve complex problems effectively. Yet, research shows that nearly 90% of AI/ML models never actually make it into production or hit the market. The main challenge is that ML/AI models require huge volumes of high-quality, accurate, and timely data to be effective, but organizations have long been reluctant to share sensitive information due to security and privacy concerns. Personal data is becoming more pervasive, causing privacy concerns to grow. As a result, global data protection laws have become stricter, and organizations face increasingly higher noncompliance risks. Mitigating such concerns and taking AI/ML to the next level requires a new approach to collaboration -- secure collaborative learning.

collaborate, enclave, encrypted data, (12 more...)

#artificialintelligence

Country: North America > United States > California > Alameda County > Berkeley (0.05)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.31)

Add feedback

Review on Classification Techniques used in Biophysiological Stress Monitoring

Iqbal, Talha, Elahi, Adnan, Shahzad, Atif, Wijns, William

arXiv.org Artificial IntelligenceOct-28-2022

Cardiovascular activities are directly related to the response of a body in a stressed condition. Stress, based on its intensity, can be divided into two types i.e. Acute stress (short-term stress) and Chronic stress (long-term stress). Repeated acute stress and continuous chronic stress may play a vital role in inflammation in the circulatory system and thus leads to a heart attack or to a stroke. In this study, we have reviewed commonly used machine learning classification techniques applied to different stress-indicating parameters used in stress monitoring devices. These parameters include Photoplethysmograph (PPG), Electrocardiographs (ECG), Electromyograph (EMG), Galvanic Skin Response (GSR), Heart Rate Variation (HRV), skin temperature, respiratory rate, Electroencephalograph (EEG) and salivary cortisol, used in stress monitoring devices. This study also provides a discussion on choosing a classifier, which depends upon a number of factors other than accuracy, like the number of subjects involved in an experiment, type of signals processing and computational limitations.

artificial intelligence, classifier, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2210.1604

Country:

North America > United States (0.04)
Europe > United Kingdom > England > West Midlands > Birmingham (0.04)
Europe > Netherlands > Limburg > Maastricht (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.88)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(4 more...)

Add feedback

[2210.14518v1] Which Factors Matter Most? Can Startup Valuation be Micro-Targeted?

#artificialintelligenceOct-27-2022, 00:08:46 GMT

While startup valuations are influenced by revenues, risks, age, and macroeconomic conditions, specific causality is traditionally a black box. Because valuations are not disclosed, roles played by other factors (industry, geography, and intellectual property) can often only be guessed at. VC valuation research indicates the importance of establishing a factor-hierarchy to better understand startup valuations and their dynamics, suggesting the wisdom of hiring data-scientists for this purpose. Bespoke understanding can be established via construction of hierarchical prediction models based on decision trees and random forests. These have the advantage of understanding which factors matter most. In combination with OLS, the also tell us the circumstances of when specific causalities apply. This study explores the deterministic role of categorical variables on the valuation of start-ups (i.e. the joint-combination geographic, urban, and sectoral denomination-variables), in order to be able to build a generalized valuation scorecard approach. Using a dataset of 1,091 venture-capital investments, containing 1,044 unique EU and EEA, this study examines microeconomic, sectoral, and local-level impacts on startup valuation. In principle, the study relies on Fixedeffects and Joint-fixed-effects regressions as well as the analysis and exploration of divergent micropopulations and fault-lines by means of non-parametric approaches combining econometric and machinelearning techniques.

decision tree learning, machine learning, micro-targeted, (2 more...)

#artificialintelligence

Genre: Research Report (1.00)

Industry: Banking & Finance > Capital Markets (0.53)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.53)

Add feedback

Exploring the Whole Rashomon Set of Sparse Decision Trees

Xin, Rui, Zhong, Chudi, Chen, Zhi, Takagi, Takuya, Seltzer, Margo, Rudin, Cynthia

arXiv.org Artificial IntelligenceOct-25-2022

In any given machine learning problem, there might be many models that explain the data almost equally well. However, most learning algorithms return only one of these models, leaving practitioners with no practical way to explore alternative models that might have desirable properties beyond what could be expressed by a loss function. The Rashomon set is the set of these all almost-optimal models. Rashomon sets can be large in size and complicated in structure, particularly for highly nonlinear function classes that allow complex interaction terms, such as decision trees. We provide the first technique for completely enumerating the Rashomon set for sparse decision trees; in fact, our work provides the first complete enumeration of any Rashomon set for a non-trivial problem with a highly nonlinear discrete function class. This allows the user an unprecedented level of control over model choice among all models that are approximately equally good. We represent the Rashomon set in a specialized data structure that supports efficient querying and sampling. We show three applications of the Rashomon set: 1) it can be used to study variable importance for the set of almost-optimal trees (as opposed to a single tree), 2) the Rashomon set for accuracy enables enumeration of the Rashomon sets for balanced accuracy and F1-score, and 3) the Rashomon set for a full dataset can be used to produce Rashomon sets constructed with only subsets of the data set. Thus, we are able to examine Rashomon sets across problems with a new lens, enabling users to choose models rather than be at the mercy of an algorithm that produces only a single model.

artificial intelligence, decision tree learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2209.0804

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > Canada > British Columbia (0.04)

Genre: Research Report (1.00)

Industry:

Education (0.66)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Add feedback

Fast Optimization of Weighted Sparse Decision Trees for use in Optimal Treatment Regimes and Optimal Policy Design

Behrouz, Ali, Lecuyer, Mathias, Rudin, Cynthia, Seltzer, Margo

arXiv.org Artificial IntelligenceOct-25-2022

Sparse decision trees are one of the most common forms of interpretable models. While recent advances have produced algorithms that fully optimize sparse decision trees for prediction, that work does not address policy design, because the algorithms cannot handle weighted data samples. Specifically, they rely on the discreteness of the loss function, which means that real-valued weights cannot be directly used. For example, none of the existing techniques produce policies that incorporate inverse propensity weighting on individual data points. We present three algorithms for efficient sparse weighted decision tree optimization. The first approach directly optimizes the weighted loss function; however, it tends to be computationally inefficient for large datasets. Our second approach, which scales more efficiently, transforms weights to integer values and uses data duplication to transform the weighted decision tree optimization problem into an unweighted (but larger) counterpart. Our third algorithm, which scales to much larger datasets, uses a randomized procedure that samples each data point with a probability proportional to its weight. We present theoretical bounds on the error of the two fast methods and show experimentally that these methods can be two orders of magnitude faster than the direct optimization of the weighted loss, without losing significant accuracy.

artificial intelligence, decision tree learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2210.06825

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
Europe > Netherlands (0.05)
North America > United States > North Carolina > Durham County > Durham (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine (1.00)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

No imputation without representation

Lenz, Oliver Urs, Peralta, Daniel, Cornelis, Chris

arXiv.org Artificial IntelligenceOct-25-2022

Imputation allows datasets to be used with algorithms that cannot handle missing values by themselves. However, missing values may in principle contribute useful information that is lost through imputation. The missing-indicator approach can be used to preserve this information. There are several theoretical considerations why missing-indicators may or may not be beneficial, but there has not been any large-scale practical experiment on real-life datasets to test this question for machine learning predictions. We perform this experiment for three imputation strategies and a range of different classification algorithms, on the basis of twenty real-life datasets. We find that missing-indicators generally increase classification performance, and that nearest neighbour and iterative imputation do not lead to better performance than simple mean/mode imputation. Therefore, we recommend the use of missing-indicators with mean/mode imputation as a safe default, with the caveat that for decision trees, pruning is necessary to prevent overfitting.

artificial intelligence, imputation, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2206.14254

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Wisconsin (0.04)
North America > United States > Texas (0.04)
(6 more...)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.96)
Health & Medicine > Therapeutic Area > Oncology (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
(3 more...)

Add feedback

Improving Data Quality with Training Dynamics of Gradient Boosting Decision Trees

Ponti, Moacir Antonelli, Oliveira, Lucas de Angelis, Román, Juan Martín, Argerich, Luis

arXiv.org Artificial IntelligenceOct-20-2022

Real world datasets contain incorrectly labeled instances that hamper the performance of the model and, in particular, the ability to generalize out of distribution. Also, each example might have different contribution towards learning. This motivates studies to better understanding of the role of data instances with respect to their contribution in good metrics in models. In this paper we propose a method based on metrics computed from training dynamics of Gradient Boosting Decision Trees (GBDTs) to assess the behavior of each training example. We focus on datasets containing mostly tabular or structured data, for which the use of Decision Trees ensembles are still the state-of-the-art in terms of performance. We show results on detecting noisy labels in order to either remove them, improving models' metrics in synthetic and real datasets, as well as a productive dataset. Our methods achieved the best results overall when compared with confident learning and heuristics.

artificial intelligence, inductive learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2210.11327

Country:

North America > United States > Wisconsin (0.04)
Asia (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.87)
(2 more...)

Add feedback

Comparing Machine Learning Techniques for Alfalfa Biomass Yield Prediction

Vance, Jonathan, Rasheed, Khaled, Missaoui, Ali, Maier, Frederick, Adkins, Christian, Whitmire, Chris

arXiv.org Artificial IntelligenceOct-20-2022

The alfalfa crop is globally important as livestock feed, so highly efficient planting and harvesting could benefit many industries, especially as the global climate changes and traditional methods become less accurate. Recent work using machine learning (ML) to predict yields for alfalfa and other crops has shown promise. Previous efforts used remote sensing, weather, planting, and soil data to train machine learning models for yield prediction. However, while remote sensing works well, the models require large amounts of data and cannot make predictions until the harvesting season begins. Using weather and planting data from alfalfa variety trials in Kentucky and Georgia, our previous work compared feature selection techniques to find the best technique and best feature set. In this work, we trained a variety of machine learning models, using cross validation for hyperparameter optimization, to predict biomass yields, and we showed better accuracy than similar work that employed more complex techniques. Our best individual model was a random forest with a mean absolute error of 0.081 tons/acre and R{$^2$} of 0.941. Next, we expanded this dataset to include Wisconsin and Mississippi, and we repeated our experiments, obtaining a higher best R{$^2$} of 0.982 with a regression tree. We then isolated our testing datasets by state to explore this problem's eligibility for domain adaptation (DA), as we trained on multiple source states and tested on one target state. This Trivial DA (TDA) approach leaves plenty of room for improvement through exploring more complex DA techniques in forthcoming work.

artificial intelligence, machine learning, prediction, (19 more...)

arXiv.org Artificial Intelligence

2210.11226

Country:

North America > United States > Mississippi (0.25)
Europe > Denmark > Capital Region > Copenhagen (0.05)
North America > United States > Georgia > Clarke County > Athens (0.05)
(8 more...)

Genre: Research Report > Experimental Study (0.69)

Industry:

Food & Agriculture > Agriculture (1.00)
Government > Regional Government > North America Government > United States Government (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Distributional Adaptive Soft Regression Trees

Umlauf, Nikolaus, Klein, Nadja

arXiv.org Machine LearningOct-19-2022

Random forests are an ensemble method relevant for many problems, such as regression or classification. They are popular due to their good predictive performance (compared to, e.g., decision trees) requiring only minimal tuning of hyperparameters. They are built via aggregation of multiple regression trees during training and are usually calculated recursively using hard splitting rules. Recently regression forests have been incorporated into the framework of distributional regression, a nowadays popular regression approach aiming at estimating complete conditional distributions rather than relating the mean of an output variable to input features only - as done classically. This article proposes a new type of a distributional regression tree using a multivariate soft split rule. One great advantage of the soft split is that smooth high-dimensional functions can be estimated with only one tree while the complexity of the function is controlled adaptive by information criteria. Moreover, the search for the optimal split variable is obsolete. We show by means of extensive simulation studies that the algorithm has excellent properties and outperforms various benchmark methods, especially in the presence of complex non-linear feature interactions. Finally, we illustrate the usefulness of our approach with an example on probabilistic forecasts for the Sun's activity.

artificial intelligence, decision tree learning, machine learning, (19 more...)

arXiv.org Machine Learning

2210.10389

Country:

Europe > Austria > Tyrol > Innsbruck (0.04)
North America > United States > New York (0.04)
Europe > Germany > Berlin (0.04)
Asia > China (0.04)

Genre: Research Report (0.40)

Industry:

Energy (0.68)
Government > Space Agency (0.47)
Government > Regional Government > North America Government > United States Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

Robust Trees for Security

#artificialintelligenceOct-18-2022, 08:25:44 GMT

Tree models are widely used for security, such as detecting malicious autonomous system, social engineering, malware distribution, phishing emails, advertising resources for ad blocker, and online scams, etc. Despite their popularity, the robustness of tree models has not been thoroughly studied in the context of security applications. In this post, I will show how to train robust trees to detect Twitter spam. Our most exciting result is that we can increase the feature manipulation cost for adaptive attackers to evade the robust tree ensemble by 10.6X. We used the dataset from Kwon et al. and re-extracted 25 features.

attacker, robust split, threat model, (13 more...)

#artificialintelligence

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.73)

Add feedback