AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

Trained Random Forests Completely Reveal your Dataset

Ferry, Julien, Fukasawa, Ricardo, Pascal, Timothée, Vidal, Thibaut

arXiv.org Artificial IntelligenceFeb-29-2024

We introduce an optimization-based reconstruction attack capable of completely or near-completely reconstructing a dataset utilized for training a random forest. Notably, our approach relies solely on information readily available in commonly used libraries such as scikit-learn. To achieve this, we formulate the reconstruction problem as a combinatorial problem under a maximum likelihood objective. We demonstrate that this problem is NP-hard, though solvable at scale using constraint programming -- an approach rooted in constraint propagation and solution-domain reduction. Through an extensive computational investigation, we demonstrate that random forests trained without bootstrap aggregation but with feature randomization are susceptible to a complete reconstruction. This holds true even with a small number of trees. Even with bootstrap aggregation, the majority of the data can also be reconstructed. These findings underscore a critical vulnerability inherent in widely adopted ensemble methods, warranting attention and mitigation. Although the potential for such reconstruction attacks has been discussed in privacy research, our study provides clear empirical evidence of their practicability.

constraint, dataset, experiment, (14 more...)

arXiv.org Artificial Intelligence

2402.19232

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > San Diego County > San Diego (0.04)
North America > Canada > Quebec > Montreal (0.04)
(6 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.91)

Add feedback

An Adversarial Robustness Benchmark for Enterprise Network Intrusion Detection

Vitorino, João, Silva, Miguel, Maia, Eva, Praça, Isabel

arXiv.org Artificial IntelligenceFeb-25-2024

As cyber-attacks become more sophisticated, improving the robustness of Machine Learning (ML) models must be a priority for enterprises of all sizes. To reliably compare the robustness of different ML models for cyber-attack detection in enterprise computer networks, they must be evaluated in standardized conditions. This work presents a methodical adversarial robustness benchmark of multiple decision tree ensembles with constrained adversarial examples generated from standard datasets. The robustness of regularly and adversarially trained RF, XGB, LGBM, and EBM models was evaluated on the original CICIDS2017 dataset, a corrected version of it designated as NewCICIDS, and the HIKARI dataset, which contains more recent network traffic. NewCICIDS led to models with a better performance, especially XGB and EBM, but RF and LGBM were less robust against the more recent cyber-attacks of HIKARI. Overall, the robustness of the models to adversarial cyber-attack examples was improved without their generalization to regular traffic being affected, enabling a reliable detection of suspicious activity without costly increases of false alarms.

artificial intelligence, decision tree learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2402.16912

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
Europe > Portugal > Porto > Porto (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.97)
(4 more...)

Add feedback

Verifiable Boosted Tree Ensembles

Calzavara, Stefano, Cazzaro, Lorenzo, Lucchese, Claudio, Pibiri, Giulio Ermanno

arXiv.org Machine LearningFeb-22-2024

Verifiable learning advocates for training machine learning models amenable to efficient security verification. Prior research demonstrated that specific classes of decision tree ensembles -- called large-spread ensembles -- allow for robustness verification in polynomial time against any norm-based attacker. This study expands prior work on verifiable learning from basic ensemble methods (i.e., hard majority voting) to advanced boosted tree ensembles, such as those trained using XGBoost or LightGBM. Our formal results indicate that robustness verification is achievable in polynomial time when considering attackers based on the $L_\infty$-norm, but remains NP-hard for other norm-based attackers. Nevertheless, we present a pseudo-polynomial time algorithm to verify robustness against attackers based on the $L_p$-norm for any $p \in \mathbb{N} \cup \{0\}$, which in practice grants excellent performance. Our experimental evaluation shows that large-spread boosted ensembles are accurate enough for practical adoption, while being amenable to efficient security verification.

ensemble, robustness, verification, (17 more...)

arXiv.org Machine Learning

2402.14988

Country:

North America > United States (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Random forests for detecting weak signals and extracting physical information: a case study of magnetic navigation

Moradi, Mohammadamin, Zhai, Zheng-Meng, Nielsen, Aaron, Lai, Ying-Cheng

arXiv.org Artificial IntelligenceFeb-21-2024

It was recently demonstrated that two machine-learning architectures, reservoir computing and time-delayed feed-forward neural networks, can be exploited for detecting the Earth's anomaly magnetic field immersed in overwhelming complex signals for magnetic navigation in a GPS-denied environment. The accuracy of the detected anomaly field corresponds to a positioning accuracy in the range of 10 to 40 meters. To increase the accuracy and reduce the uncertainty of weak signal detection as well as to directly obtain the position information, we exploit the machine-learning model of random forests that combines the output of multiple decision trees to give optimal values of the physical quantities of interest. In particular, from time-series data gathered from the cockpit of a flying airplane during various maneuvering stages, where strong background complex signals are caused by other elements of the Earth's magnetic field and the fields produced by the electronic systems in the cockpit, we demonstrate that the random-forest algorithm performs remarkably well in detecting the weak anomaly field and in filtering the position of the aircraft. With the aid of the conventional inertial navigation system, the positioning error can be reduced to less than 10 meters. We also find that, contrary to the conventional wisdom, the classic Tolles-Lawson model for calibrating and removing the magnetic field generated by the body of the aircraft is not necessary and may even be detrimental for the success of the random-forest method.

anomaly field, navigation, random forest, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1063/5.0189564

2402.14131

Country:

North America > United States > Arizona > Maricopa County > Tempe (0.14)
North America > United States > New York > Nassau County > Mineola (0.04)
North America > United States > Massachusetts (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Transportation > Air (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Stable Update of Regression Trees

Blørstad, Morten, Lunde, Berent Å. S., Blaser, Nello

arXiv.org Artificial IntelligenceFeb-21-2024

Updating machine learning models with new information usually improves their predictive performance, yet, in many applications, it is also desirable to avoid changing the model predictions too much. This property is called stability. In most cases when stability matters, so does explainability. We therefore focus on the stability of an inherently explainable machine learning method, namely regression trees. We aim to use the notion of empirical stability and design algorithms for updating regression trees that provide a way to balance between predictability and empirical stability. To achieve this, we propose a regularization method, where data points are weighted based on the uncertainty in the initial model. The balance between predictability and empirical stability can be adjusted through hyperparameters. This regularization method is evaluated in terms of loss and stability and assessed on a broad range of data characteristics. The results show that the proposed update method improves stability while achieving similar or better predictive performance. This shows that it is possible to achieve both predictive and stable results when updating regression trees.

configuration, dataset, stability, (15 more...)

arXiv.org Artificial Intelligence

2402.13655

Country:

North America > United States > New York > New York County > New York City (0.15)
North America > United States > California (0.05)
Europe > Norway > Western Norway > Vestland > Bergen (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Towards MLOps: A DevOps Tools Recommender System for Machine Learning System

Shah, Pir Sami Ullah, Ahmad, Naveed, Beg, Mirza Omer

arXiv.org Artificial IntelligenceFeb-20-2024

Applying DevOps practices to machine learning system is termed as MLOps and machine learning systems evolve on new data unlike traditional systems on requirements. The objective of MLOps is to establish a connection between different open-source tools to construct a pipeline that can automatically perform steps to construct a dataset, train the machine learning model and deploy the model to the production as well as store different versions of model and dataset. Benefits of MLOps is to make sure the fast delivery of the new trained models to the production to have accurate results. Furthermore, MLOps practice impacts the overall quality of the software products and is completely dependent on open-source tools and selection of relevant open-source tools is considered as challenged while a generalized method to select an appropriate open-source tools is desirable. In this paper, we present a framework for recommendation system that processes the contextual information (e.g., nature of data, type of the data) of the machine learning project and recommends a relevant toolchain (tech-stack) for the operationalization of machine learning systems. To check the applicability of the proposed framework, four different approaches i.e., rule-based, random forest, decision trees and k-nearest neighbors were investigated where precision, recall and f-score is measured, the random forest out classed other approaches with highest f-score value of 0.66.

mlop, open-source tool, pipeline, (13 more...)

arXiv.org Artificial Intelligence

2402.12867

Country:

Europe > Switzerland (0.05)
Asia > Pakistan > Islamabad Capital Territory > Islamabad (0.05)
Europe > Finland > Uusimaa > Helsinki (0.04)

Genre: Research Report (0.83)

Industry: Information Technology (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.55)

Add feedback

User Modeling and User Profiling: A Comprehensive Survey

Purificato, Erasmo, Boratto, Ludovico, De Luca, Ernesto William

arXiv.org Artificial IntelligenceFeb-20-2024

The integration of artificial intelligence (AI) into daily life, particularly through information retrieval and recommender systems, has necessitated advanced user modeling and profiling techniques to deliver personalized experiences. These techniques aim to construct accurate user representations based on the rich amounts of data generated through interactions with these systems. This paper presents a comprehensive survey of the current state, evolution, and future directions of user modeling and profiling research. We provide a historical overview, tracing the development from early stereotype models to the latest deep learning techniques, and propose a novel taxonomy that encompasses all active topics in this research area, including recent trends. Our survey highlights the paradigm shifts towards more sophisticated user profiling methods, emphasizing implicit data collection, multi-behavior modeling, and the integration of graph data structures. We also address the critical need for privacy-preserving techniques and the push towards explainability and fairness in user modeling approaches. By examining the definitions of core terminology, we aim to clarify ambiguities and foster a clearer understanding of the field by proposing two novel encyclopedic definitions of the main terms. Furthermore, we explore the application of user modeling in various domains, such as fake news detection, cybersecurity, and personalized education. This survey serves as a comprehensive resource for researchers and practitioners, offering insights into the evolution of user modeling and profiling and guiding the development of more personalized, ethical, and effective AI systems.

13th international conference, fifteenth acm international conference, ieee international conference, (17 more...)

arXiv.org Artificial Intelligence

2402.0966

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.13)
North America > United States > California > San Francisco County > San Francisco (0.13)
(42 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Instructional Material > Online (0.92)
(2 more...)

Industry:

Media (1.00)
Information Technology > Services (1.00)
Information Technology > Security & Privacy (1.00)
(4 more...)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (1.00)
Information Technology > Communications > Social Media (1.00)
(14 more...)

Add feedback

Randomization Can Reduce Both Bias and Variance: A Case Study in Random Forests

Liu, Brian, Mazumder, Rahul

arXiv.org Machine LearningFeb-19-2024

We study the often overlooked phenomenon, first noted in Breiman (2001), that random forests appear to reduce bias compared to bagging. Motivated by an interesting paper by Mentch and Zhou (2020), where the authors argue that random forests reduce effective degrees of freedom and only outperform bagging ensembles in low signal-to-noise ratio (SNR) settings, we explore how random forests can uncover patterns in the data missed by bagging. We empirically demonstrate that in the presence of such patterns, random forests reduce bias along with variance and increasingly outperform bagging ensembles when SNR is high. Our observations offer insights into the real-world success of random forests across a range of SNRs and enhance our understanding of the difference between random forests and bagging ensembles with respect to the randomization injected into each split. Our investigations also yield practical insights into the importance of tuning mtry in random forests.

ensemble, random forest, snr, (16 more...)

arXiv.org Machine Learning

2402.12668

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Kernel KMeans clustering splits for end-to-end unsupervised decision trees

Ohl, Louis, Mattei, Pierre-Alexandre, Leclercq, Mickaël, Droit, Arnaud, Precioso, Frédéric

arXiv.org Machine LearningFeb-19-2024

Trees are convenient models for obtaining explainable predictions on relatively small datasets. Although there are many proposals for the end-to-end construction of such trees in supervised learning, learning a tree end-to-end for clustering without labels remains an open challenge. As most works focus on interpreting with trees the result of another clustering algorithm, we present here a novel end-to-end trained unsupervised binary tree for clustering: Kauri. This method performs a greedy maximisation of the kernel KMeans objective without requiring the definition of centroids. We compare this model on multiple datasets with recent unsupervised trees and show that Kauri performs identically when using a linear kernel. For other kernels, Kauri often outperforms the concatenation of kernel KMeans and a CART decision tree.

algorithm, dataset, kernel, (17 more...)

arXiv.org Machine Learning

2402.12232

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > El Salvador (0.04)
North America > Canada > Quebec (0.04)
(4 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Add feedback

Empirical and Experimental Insights into Data Mining Techniques for Crime Prediction: A Comprehensive Survey

Taha, Kamal

arXiv.org Artificial IntelligenceFeb-17-2024

This survey paper presents a comprehensive analysis of crime prediction methodologies, exploring the various techniques and technologies utilized in this area. The paper covers the statistical methods, machine learning algorithms, and deep learning techniques employed to analyze crime data, while also examining their effectiveness and limitations. We propose a methodological taxonomy that classifies crime prediction algorithms into specific techniques. This taxonomy is structured into four tiers, including methodology category, methodology sub-category, methodology techniques, and methodology sub-techniques. Empirical and experimental evaluations are provided to rank the different techniques. The empirical evaluation assesses the crime prediction techniques based on four criteria, while the experimental evaluation ranks the algorithms that employ the same sub-technique, the different sub-techniques that employ the same technique, the different techniques that employ the same methodology sub-category, the different methodology sub-categories within the same category, and the different methodology categories. The combination of methodological taxonomy, empirical evaluations, and experimental comparisons allows for a nuanced and comprehensive understanding of crime prediction algorithms, aiding researchers in making informed decisions. Finally, the paper provides a glimpse into the future of crime prediction techniques, highlighting potential advancements and opportunities for further research in this field

crime prediction, effectiveness, prediction accuracy, (16 more...)

arXiv.org Artificial Intelligence

2403.0078

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Virginia > Albemarle County > Charlottesville (0.14)
North America > United States > Maryland > Baltimore (0.14)
(50 more...)

Genre:

Overview (1.00)
Instructional Material (1.00)
Research Report > New Finding (0.93)

Industry:

Law > Criminal Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)
(6 more...)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
(9 more...)

Add feedback