AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

Are Trees Really Green? A Detection Approach of IoT Malware Attacks

Sanna, Silvia Lucia, Soi, Diego, Maiorca, Davide, Giacinto, Giorgio

arXiv.org Artificial IntelligenceJun-10-2025

Nowadays, the Internet of Things (IoT) is widely employed, and its usage is growing exponentially because it facilitates remote monitoring, predictive maintenance, and data-driven decision making, especially in the healthcare and industrial sectors. However, IoT devices remain vulnerable due to their resource constraints and difficulty in applying security patches. Consequently, various cybersecurity attacks are reported daily, such as Denial of Service, particularly in IoT-driven solutions. Most attack detection methodologies are based on Machine Learning (ML) techniques, which can detect attack patterns. However, the focus is more on identification rather than considering the impact of ML algorithms on computational resources. This paper proposes a green methodology to identify IoT malware networking attacks based on flow privacy-preserving statistical features. In particular, the hyperparameters of three tree-based models -- Decision Trees, Random Forest and Extra-Trees -- are optimized based on energy consumption and test-time performance in terms of Matthew's Correlation Coefficient. Our results show that models maintain high performance and detection accuracy while consistently reducing power usage in terms of watt-hours (Wh). This suggests that on-premise ML-based Intrusion Detection Systems are suitable for IoT and other resource-constrained devices.

artificial intelligence, energy consumption, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2506.07836

Country:

North America > United States (0.14)
Europe > Italy (0.14)

Genre: Research Report > New Finding (0.86)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.87)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
(2 more...)

Add feedback

Even Faster Hyperbolic Random Forests: A Beltrami-Klein Wrapper Approach

Chlenski, Philippe, Pe'er, Itsik

arXiv.org Artificial IntelligenceJun-6-2025

Decision trees and models that use them as primitives are workhorses of machine learning in Euclidean spaces. Recent work has further extended these models to the Lorentz model of hyperbolic space by replacing axis-parallel hyperplanes with homogeneous hyperplanes when partitioning the input space. In this paper, we show how the hyperDT algorithm can be elegantly reexpressed in the Beltrami-Klein model of hyperbolic spaces. This preserves the thresholding operation used in Euclidean decision trees, enabling us to further rewrite hyperDT as simple pre- and post-processing steps that form a wrapper around existing tree-based models designed for Euclidean spaces. The wrapper approach unlocks many optimizations already available in Euclidean space models, improving flexibility, speed, and accuracy while offering a simpler, more maintainable, and extensible codebase. Our implementation is available at https://github.com/pchlenski/hyperdt.

artificial intelligence, fast-hyperdt, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2506.0436

Genre: Research Report (0.43)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Interpretable reinforcement learning for heat pump control through asymmetric differentiable decision trees

Van Puyvelde, Toon, Zareh, Mehran, Develder, Chris

arXiv.org Artificial IntelligenceJun-3-2025

In recent years, deep reinforcement learning (DRL) algorithms have gained traction in home energy management systems. However, their adoption by energy management companies remains limited due to the black-box nature of DRL, which fails to provide transparent decision-making feedback. To address this, explainable reinforcement learning (XRL) techniques have emerged, aiming to make DRL decisions more transparent. Among these, soft differential decision tree (DDT) distillation provides a promising approach due to the clear decision rules they are based on, which can be efficiently computed. However, achieving high performance often requires deep, and completely full, trees, which reduces interpretability. To overcome this, we propose a novel asymmetric soft DDT construction method. Unlike traditional soft DDTs, our approach adaptively constructs trees by expanding nodes only when necessary. This improves the efficient use of decision nodes, which require a predetermined depth to construct full symmetric trees, enhancing both interpretability and performance. We demonstrate the potential of asymmetric DDTs to provide transparent, efficient, and high-performing decision-making in home energy management systems.

machine learning, node, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3679240.3734671

2506.01641

Genre: Research Report > Promising Solution (0.67)

Industry:

Energy > Power Industry (0.76)
Energy > Renewable > Geothermal > Geothermal Energy Systems and Facilities > Direct Use of Geothermal Energy > Geothermal Heating, Ventilation, and Air Conditioning (HVAC) System (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

TIME: TabPFN-Integrated Multimodal Engine for Robust Tabular-Image Learning

Luo, Jiaqi, Yuan, Yuan, Xu, Shixin

arXiv.org Artificial IntelligenceJun-3-2025

Tabular-image multimodal learning, which integrates structured tabular data with imaging data, holds great promise for a variety of tasks, especially in medical applications. Yet, two key challenges remain: (1) the lack of a standardized, pretrained representation for tabular data, as is commonly available in vision and language domains; and (2) the difficulty of handling missing values in the tabular modality, which are common in real-world medical datasets. To address these issues, we propose the T abPFN-Integrated M ultimodal E ngine ( TIME), a novel multimodal framework that builds on the recently introduced tabular foundation model, TabPFN. TIME leverages TabPFN as a frozen tabular encoder to generate robust, strong embeddings that are naturally resilient to missing data, and combines them with image features from pretrained vision backbones. Extensive experiments demonstrate that TIME consistently outperforms competitive baselines across both complete and incomplete tabular inputs, underscoring its practical value in real-world mul-timodal learning scenarios. Keywords: Multimodal Learning, Tabular-Image, Pretrained Model, TabPFN 1. Introduction Multimodal learning has emerged as a powerful paradigm for integrating diverse data sources to enhance learning and decision-making across a wide range of domains [1]. Among the many forms of multimodal integration, tabular-image multimodal learning plays a uniquely important role, especially in clinical and biomedical applications [2, 3]. In such settings, structured tabular data such as laboratory test results often coexist with unstructured imaging data like X-rays.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2506.00813

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.94)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Information Management (1.00)
Information Technology > Data Science (1.00)
(5 more...)

Add feedback

Review for NeurIPS paper: Model Class Reliance for Random Forests

Neural Information Processing SystemsJun-2-2025, 01:09:46 GMT

Weaknesses: The main concern I have with the paper is in the argument that the estimator does in fact converge to MCR and MCR- for random forests. Section 4.1 provides an argument that, as the number of trees goes to infinity, each tree will be replaced with one from its Rashomon set that is maximally dependent on X1 (when doing the MCR procedure). In finite samples, and with a finite number of trees, there are reasons to doubt whether this method provides consistent estimation of MCR for the random forest as a whole. The favorable generalization properties of random forests are known to be derived from the diversity of trees in the ensemble: a well known result of Breiman is that the generalization error decreases as the correlation of the residuals from the trees decreases. While the predictions of a tree and its surrogate may be identical for a given dataset, replacing a tree with the surrogate seems that it may decrease the expected generalization error of the tree as a whole.

model class reliance, random forest, simulation, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Review for NeurIPS paper: Decision trees as partitioning machines to characterize their generalization properties

Neural Information Processing SystemsJun-1-2025, 00:44:49 GMT

Weaknesses: * From the theoretical analysis, the main weakness might be the analysis on pure continuous features. Nowadays, it is very unlikely to have this scenario in the most challenging machine learning problems. Thus, the theoretical implications can be limited due to this. From the empirical evaluations, I am curious to know why the method was only compared to a very old algorithm such as CART. Is it the only reasonable algorithm out there for decision trees that can be comparable to the method proposed?

decision tree, generalization property, neurips paper

Neural Information Processing Systems

Industry: Education > Focused Education > Special Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.81)

Add feedback

Review for NeurIPS paper: Decision trees as partitioning machines to characterize their generalization properties

Neural Information Processing SystemsJun-1-2025, 00:44:41 GMT

The paper provides an analysis of the combinatorial properties of classes of decision trees. The reviewers found the results to be valuable, both from a theoretical perspective (novel approach in the proofs), and from a practical perspective (leading to an SRM approach to pruning).

decision tree, generalization property, neurips paper

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.82)

Add feedback

LLMPR: A Novel LLM-Driven Transfer Learning based Petition Ranking Model

Gayen, Avijit, Chakraborty, Somyajit, Sen, Mainak, Paul, Soham, Jana, Angshuman

arXiv.org Artificial IntelligenceMay-29-2025

The persistent accumulation of unresolved legal cases, especially within the Indian judiciary, significantly hampers the timely delivery of justice. Manual methods of prioritizing petitions are often prone to inefficiencies and subjective biases further exacerbating delays. To address this issue, we propose LLMPR (Large Language Model-based Petition Ranking), an automated framework that utilizes transfer learning and machine learning to assign priority rankings to legal petitions based on their contextual urgency. Leveraging the ILDC dataset comprising 7,593 annotated petitions, we process unstructured legal text and extract features through various embedding techniques, including DistilBERT, LegalBERT, and MiniLM. These textual embeddings are combined with quantitative indicators such as gap days, rank scores, and word counts to train multiple machine learning models, including Random Forest, Decision Tree, XGBoost, LightGBM, and CatBoost. Our experiments demonstrate that Random Forest and Decision Tree models yield superior performance, with accuracy exceeding 99% and a Spearman rank correlation of 0.99. Notably, models using only numerical features achieve nearly optimal ranking results (R2 = 0.988, \r{ho} = 0.998), while LLM-based embeddings offer only marginal gains. These findings suggest that automated petition ranking can effectively streamline judicial workflows, reduce case backlog, and improve fairness in legal prioritization.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2505.21689

Country: Asia > India (0.68)

Genre: Research Report > New Finding (1.00)

Industry: Law > Government & the Courts (0.37)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Autoencoding Random Forests

Vu, Binh Duc, Kapar, Jan, Wright, Marvin, Watson, David S.

arXiv.org Machine LearningMay-28-2025

We propose a principled method for autoencoding with random forests. Our strategy builds on foundational results from nonparametric statistics and spectral graph theory to learn a low-dimensional embedding of the model that optimally represents relationships in the data. We provide exact and approximate solutions to the decoding problem via constrained optimization, split relabeling, and nearest neighbors regression. These methods effectively invert the compression pipeline, establishing a map from the embedding space back to the input space using splits learned by the ensemble's constituent trees. The resulting decoders are universally consistent under common regularity assumptions. The procedure works with supervised or unsupervised models, providing a window into conditional or joint distributions. We demonstrate various applications of this autoencoder, including powerful new tools for visualization, compression, clustering, and denoising. Experiments illustrate the ease and utility of our method in a wide range of settings, including tabular, image, and genomic data.

artificial intelligence, decision tree learning, machine learning, (18 more...)

arXiv.org Machine Learning

2505.21441

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany > Bremen > Bremen (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area (0.68)
Health & Medicine > Pharmaceuticals & Biotechnology (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
(2 more...)

Add feedback

Interpretable DNFs

Cooper, Martin C., Bousdira, Imane, Carbonnel, Clément

arXiv.org Artificial IntelligenceMay-28-2025

A classifier is considered interpretable if each of its decisions has an explanation which is small enough to be easily understood by a human user. A DNF formula can be seen as a binary classifier $κ$ over boolean domains. The size of an explanation of a positive decision taken by a DNF $κ$ is bounded by the size of the terms in $κ$, since we can explain a positive decision by giving a term of $κ$ that evaluates to true. Since both positive and negative decisions must be explained, we consider that interpretable DNFs are those $κ$ for which both $κ$ and $\overlineκ$ can be expressed as DNFs composed of terms of bounded size. In this paper, we study the family of $k$-DNFs whose complements can also be expressed as $k$-DNFs. We compare two such families, namely depth-$k$ decision trees and nested $k$-DNFs, a novel family of models. Experiments indicate that nested $k$-DNFs are an interesting alternative to decision trees in terms of interpretability and accuracy.

artificial intelligence, decision tree, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2505.21212

Country: Europe > France (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback