AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

Machine Learning-Driven Adaptive OpenMP For Portable Performance on Heterogeneous Systems

Georgakoudis, Giorgis, Parasyris, Konstantinos, Liao, Chunhua, Beckingsale, David, Gamblin, Todd, de Supinski, Bronis

arXiv.org Artificial IntelligenceMar-15-2023

The end of Dennard scaling law -- which stipulated a continuous increase in processor clock frequency by transistor miniaturization -- in conjunction with the continuation of Moore's law -- which expects the number of CMOS transistors within a microchip to double every two years -- shifted the technology trend towards parallel architectures. In the early 2000's parallel computer system architectures focused on multi-core CPU architectures. Later the introduction of the GPGPU paradigms pivoted technology trends to heterogeneous systems composed of both multi-core CPUs and GPUs. This heterogeneity unveiled the challenge of software performance portability. Software performance portability seeks to achieve equivalent performance regardless of the underlying hardware architecture using a single application implementation. Programming models, such as OmpSs [9], OpenMP, Kokkos [10], and RAJA [15], provide abstractions to hide the vendor-specific interfaces required to develop applications on all these heterogeneous parallel architectures and offer unified interfaces to express parallelism. Although these programming models provide a single and convenient layer to implement portable code, the performance of the same application can vary when executed on different architectures and systems. Thus, these programming models efficiently express portable code, but the application performance-portability is unspecified for application executions on different heterogeneous systems. For example, HPC programmers have found that a single version of source code, with an associated static definition of exarXiv:2303.08873v1

artificial intelligence, decision tree learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2303.08873

Country:

North America > United States > Tennessee > Anderson County > Oak Ridge (0.04)
North America > United States > Maryland > Baltimore (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Tyne and Wear > Sunderland (0.04)

Genre:

Research Report (0.50)
Workflow (0.46)

Industry: Semiconductors & Electronics (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Predicting Individualized Effects of Internet-Based Treatment for Genito-Pelvic Pain/Penetration Disorder: Development and Internal Validation of a Multivariable Decision Tree Model

Zarski, Anna-Carlotta, Harrer, Mathias, Kuper, Paula, Sprenger, Antonia A., Berking, Matthias, Ebert, David Daniel

arXiv.org Machine LearningMar-15-2023

Genito-Pelvic Pain/Penetration-Disorder (GPPPD) is a common disorder but rarely treated in routine care. Previous research documents that GPPPD symptoms can be treated effectively using internet-based psychological interventions. However, non-response remains common for all state-of-the-art treatments and it is unclear which patient groups are expected to benefit most from an internet-based intervention. Multivariable prediction models are increasingly used to identify predictors of heterogeneous treatment effects, and to allocate treatments with the greatest expected benefits. In this study, we developed and internally validated a multivariable decision tree model that predicts effects of an internet-based treatment on a multidimensional composite score of GPPPD symptoms. Data of a randomized controlled trial comparing the internet-based intervention to a waitlist control group (N =200) was used to develop a decision tree model using model-based recursive partitioning. Model performance was assessed by examining the apparent and bootstrap bias-corrected performance. The final pruned decision tree consisted of one splitting variable, joint dyadic coping, based on which two response clusters emerged. No effect was found for patients with low dyadic coping ($n$=33; $d$=0.12; 95% CI: -0.57-0.80), while large effects ($d$=1.00; 95%CI: 0.68-1.32; $n$=167) are predicted for those with high dyadic coping at baseline. The bootstrap-bias-corrected performance of the model was $R^2$=27.74% (RMSE=13.22).

artificial intelligence, internet-based treatment, machine learning, (15 more...)

arXiv.org Machine Learning

2303.08732

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.05)
Europe > Germany > Saxony-Anhalt > Magdeburg (0.04)
Europe > Sweden (0.04)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)
Education > Educational Setting (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Are Models Trained on Indian Legal Data Fair?

Girhepuje, Sahil, Goel, Anmol, Krishnan, Gokul S, Goyal, Shreya, Pandey, Satyendra, Kumaraguru, Ponnurangam, Ravindran, Balaraman

arXiv.org Artificial IntelligenceMar-14-2023

Recent advances and applications of language technology and artificial intelligence have enabled much success across multiple domains like law, medical and mental health. AI-based Language Models, like Judgement Prediction, have recently been proposed for the legal sector. However, these models are strife with encoded social biases picked up from the training data. While bias and fairness have been studied across NLP, most studies primarily locate themselves within a Western context. In this work, we present an initial investigation of fairness from the Indian perspective in the legal domain. We highlight the propagation of learnt algorithmic biases in the bail prediction task for models trained on Hindi legal documents. We evaluate the fairness gap using demographic parity and show that a decision tree model trained for the bail prediction task has an overall fairness disparity of 0.237 between input features associated with Hindus and Muslims. Additionally, we highlight the need for further research and studies in the avenues of fairness/bias in applying AI in the legal sector with a specific focus on the Indian context.

machine learning, natural language, prediction, (18 more...)

arXiv.org Artificial Intelligence

2303.07247

Country:

Asia > India > Uttar Pradesh (0.05)
South America > Argentina > Patagonia > Río Negro Province > Viedma (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.90)

Add feedback

Adversarial random forests for density estimation and generative modeling

Watson, David S., Blesch, Kristin, Kapar, Jan, Wright, Marvin N.

arXiv.org Artificial IntelligenceMar-13-2023

We propose methods for density estimation and data synthesis using a novel form of unsupervised random forests. Inspired by generative adversarial networks, we implement a recursive procedure in which trees gradually learn structural properties of the data through alternating rounds of generation and discrimination. The method is provably consistent under minimal assumptions. Unlike classic tree-based alternatives, our approach provides smooth (un)conditional densities and allows for fully synthetic data generation. We achieve comparable or superior performance to state-of-the-art probabilistic circuits and deep learning models on various tabular data benchmarks while executing about two orders of magnitude faster on average. An accompanying $\texttt{R}$ package, $\texttt{arf}$, is available on $\texttt{CRAN}$.

artificial intelligence, machine learning, random forest, (17 more...)

arXiv.org Artificial Intelligence

2205.09435

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > New York (0.04)
Europe > Germany > Bremen > Bremen (0.04)
(11 more...)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Detection of DDoS Attacks in Software Defined Networking Using Machine Learning Models

Hamarshe, Ahmad, Ashqar, Huthaifa I., Hamarsheh, Mohammad

arXiv.org Artificial IntelligenceMar-11-2023

The concept of Software Defined Networking (SDN) represents a modern approach to networking that separates the control plane from the data plane through network abstraction, resulting in a flexible, programmable and dynamic architecture compared to traditional networks. The separation of control and data planes has led to a high degree of network resilience, but has also given rise to new security risks, including the threat of distributed denial-of-service (DDoS) attacks, which pose a new challenge in the SDN environment. In this paper, the effectiveness of using machine learning algorithms to detect distributed denial-of-service (DDoS) attacks in software-defined networking (SDN) environments is investigated. Four algorithms, including Random Forest, Decision Tree, Support Vector Machine, and XGBoost, were tested on the CICDDoS2019 dataset, with the timestamp feature dropped among others. Performance was assessed by measures of accuracy, recall, accuracy, and F1 score, with the Random Forest algorithm having the highest accuracy, at 68.9%. The results indicate that ML-based detection is a more accurate and effective method for identifying DDoS attacks in SDN, despite the computational requirements of non-parametric algorithms.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2303.06513

Country: Asia > China (0.04)

Genre: Research Report (0.83)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.57)

Add feedback

Credit Card Fraud Detection Using Enhanced Random Forest Classifier for Imbalanced Data

Aburbeian, AlsharifHasan Mohamad, Ashqar, Huthaifa I.

arXiv.org Artificial IntelligenceMar-11-2023

The credit card has become the most popular payment method for both online and offline transactions. The necessity to create a fraud detection algorithm to precisely identify and stop fraudulent activity arises as a result of both the development of technology and the rise in fraud cases. This paper implements the random forest (RF) algorithm to solve the issue in the hand. A dataset of credit card transactions was used in this study. The main problem when dealing with credit card fraud detection is the imbalanced dataset in which most of the transaction are non-fraud ones. To overcome the problem of the imbalanced dataset, the synthetic minority over-sampling technique (SMOTE) was used. Implementing the hyperparameters technique to enhance the performance of the random forest classifier. The results showed that the RF classifier gained an accuracy of 98% and about 98% of F1-score value, which is promising. We also believe that our model is relatively easy to apply and can overcome the issue of imbalanced data for fraud detection applications.

artificial intelligence, decision tree learning, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2303.06514

Country:

Asia > Singapore (0.04)
Asia > Middle East > Palestine (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Law Enforcement & Public Safety > Fraud (1.00)
Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
(2 more...)

Add feedback

NFL Career Success as Predicted by NFL Scouting Combine

Szekely, Brian, Sinnott, Christian, Halow, Savannah, Ryan, Gregory

arXiv.org Artificial IntelligenceMar-10-2023

The National Football League (NFL) Scouting Combine serves as a tool to evaluate the skills of prospective players and assess their readiness to play in the NFL. The development of machine learning brings new opportunities in assessing the utility of the Scouting Combine. Using machine and statistical learning, it may be possible to predict future success of prospective athletes, as well as predict which Scouting Combine tests are the most important. Results from statistical learning research have been contradicting whether the Scouting combine is a useful metric for player success. In this study, we investigate if machine learning can be used to determine matriculation and future success in the NFL. Using Scouting Combine data, we evaluate six different algorithms' ability to predict whether a potential draft pick will play a single NFL snap (matriculation). If a player is drafted, we predict how many snaps they go on to play (success). We are able to predict matriculation with 83% accuracy; however, we are unable to predict later success. Our best performing algorithm returns large error and low explained variance (RMSE=1,210 snaps; ${R}^2$=0.17). These findings indicate that while the Scouting Combine can predict NFL matriculation, it may not be a reliable predictor of long-term player success.

artificial intelligence, machine learning, scouting combine, (15 more...)

arXiv.org Artificial Intelligence

2303.05774

Country: North America > United States > Nevada > Washoe County > Reno (0.04)

Genre: Research Report > New Finding (0.50)

Industry: Leisure & Entertainment > Sports > Football (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.51)

Add feedback

Lexical Complexity Prediction: An Overview

North, Kai, Zampieri, Marcos, Shardlow, Matthew

arXiv.org Artificial IntelligenceMar-8-2023

Understanding the meaning of words in context is fundamental for reading comprehension. The perceived difficulty, hereafter referred to as complexity, of a target word within a given text varies widely among readers. With an increased demand for distance learning and educational technologies[107], research into automatically predicting which words are likely to cause comprehension problems is becoming a popular area of research [115, 147, 185]. Systems have been created to identify complex words that are difficult to acquire, reproduce, or understand for children [79], second-language learners [89], people suffering from a reading disability, such as dyslexia [131] or aphasia [35, 53], or more generally, individuals with low literacy [59, 175]. In Computational Linguistics and Natural Language Processing (NLP), the task of automatically recognizing complex words is most often achieved by training machine learning (ML) models. These ML models assign a complexity value to each target word within an inputted extract, sentence, or text that allows for the identification of complex words. This information can then be used to improve downstream lexical and text simplification systems that provide simpler alternatives to aid reading comprehension. Take the extract shown in Table 1 for example.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3557885

2303.04851

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Thailand > Bangkok > Bangkok (0.05)
(44 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Instructional Material (0.87)

Industry:

Health & Medicine (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.92)
Education > Educational Setting > Online (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(4 more...)

Add feedback

Forecasting the movements of Bitcoin prices: an application of machine learning algorithms

Pabuccu, Hakan, Ongan, Serdar, Ongan, Ayse

arXiv.org Artificial IntelligenceMar-8-2023

Cryptocurrencies, such as Bitcoin, are one of the most controversial and complex technological innovations in today's financial system. This study aims to forecast the movements of Bitcoin prices at a high degree of accuracy. To this aim, four different Machine Learning (ML) algorithms are applied, namely, the Support Vector Machines (SVM), the Artificial Neural Network (ANN), the Naive Bayes (NB) and the Random Forest (RF) besides the logistic regression (LR) as a benchmark model. In order to test these algorithms, besides existing continuous dataset, discrete dataset was also created and used. For the evaluations of algorithm performances, the F statistic, accuracy statistic, the Mean Absolute Error (MAE), the Root Mean Square Error (RMSE) and the Root Absolute Error (RAE) metrics were used. The t test was used to compare the performances of the SVM, ANN, NB and RF with the performance of the LR. Empirical findings reveal that, while the RF has the highest forecasting performance in the continuous dataset, the NB has the lowest. On the other hand, while the ANN has the highest and the NB the lowest performance in the discrete dataset. Furthermore, the discrete dataset improves the overall forecasting performance in all algorithms (models) estimated.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.3934/QFE.2020031

2303.04642

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
Asia > Middle East > Republic of Türkiye > Bayburt Province > Bayburt (0.04)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Add feedback

Optimal Sparse Recovery with Decision Stumps

Banihashem, Kiarash, Hajiaghayi, MohammadTaghi, Springer, Max

arXiv.org Artificial IntelligenceMar-7-2023

Decision trees are widely used for their low computational cost, good predictive performance, and ability to assess the importance of features. Though often used in practice for feature selection, the theoretical guarantees of these methods are not well understood. We here obtain a tight finite sample bound for the feature selection problem in linear regression using single-depth decision trees. We examine the statistical properties of these "decision stumps" for the recovery of the $s$ active features from $p$ total features, where $s \ll p$. Our analysis provides tight sample performance guarantees on high-dimensional sparse systems which align with the finite sample bound of $O(s \log p)$ as obtained by Lasso, improving upon previous bounds for both the median and optimal splitting criteria. Our results extend to the non-linear regime as well as arbitrary sub-Gaussian distributions, demonstrating that tree based methods attain strong feature selection properties under a wide variety of settings and further shedding light on the success of these methods in practice. As a byproduct of our analysis, we show that we can provably guarantee recovery even when the number of active features $s$ is unknown. We further validate our theoretical results and proof methodology using computational experiments.

artificial intelligence, machine learning, theorem 5, (15 more...)

arXiv.org Artificial Intelligence

2303.04301

Country: North America > United States > Maryland (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback