Goto

Collaborating Authors

 Decision Tree Learning


Data, Trees, and Forests -- Decision Tree Learning in K-12 Education

arXiv.org Artificial Intelligence

Closely linked to the topic of ML is data science, which is of particular interest for approaches in machine learning and As a consequence of the increasing influence of thus also reflected in multiple AI curricula. Corresponding machine learning on our lives, everyone needs methods are also used to gain knowledge in a wide variety competencies to understand corresponding phenomena, of scientific disciplines. Data analysis and artificial intelligence but also to get involved in shaping our are often referred to as the fourth pillar of science world and making informed decisions regarding (Riedel et al., 2008; Tolle et al., 2011). This is becoming the influences on our society. Therefore, in K-increasingly relevant for K-12 education as well, as this shift 12 education, students need to learn about core in the scientific disciplines is also reflected in corresponding ideas and principles of machine learning.


A Kriging-Random Forest Hybrid Model for Real-time Ground Property Prediction during Earth Pressure Balance Shield Tunneling

arXiv.org Artificial Intelligence

A kriging-random forest hybrid model is developed for real-time ground property prediction ahead of the earth pressure balanced shield by integrating Kriging extrapolation and random forest, which can guide shield operating parameter selection thereby mitigate construction risks. The proposed KRF algorithm synergizes two types of information: prior information and real-time information. The previously predicted ground properties with EPB operating parameters are extrapolated via the Kriging algorithm to provide prior information for the prediction of currently being excavated ground properties. The real-time information refers to the real-time operating parameters of the EPB shield, which are input into random forest to provide a real-time prediction of ground properties. The integration of these two predictions is achieved by assigning weights to each prediction according to their uncertainties, ensuring the prediction of KRF with minimum uncertainty. The performance of the KRF algorithm is assessed via a case study of the Changsha Metro Line 4 project. It reveals that the proposed KRF algorithm can predict ground properties with an accuracy of 93%, overperforming the existing algorithms of LightGBM, AdaBoost-CART, and DNN by 29%, 8%, and 12%, respectively. Another dataset from Shenzhen Metro Line 13 project is utilized to further evaluate the model generalization performance, revealing that the model can transfer its learned knowledge from one region to another with an accuracy of 89%.


Machine-Learning-Based Classification of GPS Signal Reception Conditions Using a Dual-Polarized Antenna in Urban Areas

arXiv.org Artificial Intelligence

In urban areas, dense buildings frequently block and reflect global positioning system (GPS) signals, resulting in the reception of a few visible satellites with many multipath signals. This is a significant problem that results in unreliable positioning in urban areas. If a signal reception condition from a certain satellite can be detected, the positioning performance can be improved by excluding or de-weighting the multipath contaminated satellite signal. Thus, we developed a machine-learning-based method of classifying GPS signal reception conditions using a dual-polarized antenna. We employed a decision tree algorithm for classification using three features, one of which can be obtained only from a dual-polarized antenna. A machine-learning model was trained using GPS signals collected from various locations. When the features extracted from the GPS raw signal are input, the generated machine-learning model outputs one of the three signal reception conditions: non-line-of-sight (NLOS) only, line-of-sight (LOS) only, or LOS+NLOS. Multiple testing datasets were used to analyze the classification accuracy, which was then compared with an existing method using dual single-polarized antennas. Consequently, when the testing dataset was collected at different locations from the training dataset, a classification accuracy of 64.47% was obtained, which was slightly higher than the accuracy of the existing method using dual single-polarized antennas. Therefore, the dual-polarized antenna solution is more beneficial than the dual single-polarized antenna solution because it has a more compact form factor and its performance is similar to that of the other solution.


Efficient Online Decision Tree Learning with Active Feature Acquisition

arXiv.org Artificial Intelligence

Constructing decision trees online is a classical machine learning problem. Existing works often assume that features are readily available for each incoming data point. However, in many real world applications, both feature values and the labels are unknown a priori and can only be obtained at a cost. For example, in medical diagnosis, doctors have to choose which tests to perform (i.e., making costly feature queries) on a patient in order to make a diagnosis decision (i.e., predicting labels). We provide a fresh perspective to tackle this practical challenge. Our framework consists of an active planning oracle embedded in an online learning scheme for which we investigate several information acquisition functions. Specifically, we employ a surrogate information acquisition function based on adaptive submodularity to actively query feature values with a minimal cost, while using a posterior sampling scheme to maintain a low regret for online prediction. We demonstrate the efficiency and effectiveness of our framework via extensive experiments on various real-world datasets. Our framework also naturally adapts to the challenging setting of online learning with concept drift and is shown to be competitive with baseline models while being more flexible.


Construction of Decision Trees and Acyclic Decision Graphs from Decision Rule Systems

arXiv.org Artificial Intelligence

In this paper, we consider the problems of transforming systems of decision rules into decision trees. This paper builds upon our previous work [12]. In that paper, we showed that the minimum depth of a decision tree derived from the decision rule system can be much less than the number of different attributes in the rules from the system. In such cases, it is reasonable to use decision trees. In the present paper, for some types of decision rule systems and problems, we prove the existence of polynomial time algorithms for the construction of decision trees and two types of acyclic decision graphs representing decision trees. In all other cases, we prove the absence of such algorithms using the fact that the minimum number of nodes in decision trees or acyclic decision graphs can grow as a superpolynomial function depending on the size of decision rule systems. To avoid difficulties related to the number of nodes in the decision trees, we discuss also the possibility of not building the entire decision tree, but describing the computation path in this tree for the given input.


Interpreting Deep Forest through Feature Contribution and MDI Feature Importance

arXiv.org Artificial Intelligence

Deep forest is a non-differentiable deep model which has achieved impressive empirical success across a wide variety of applications, especially on categorical/symbolic or mixed modeling tasks. Many of the application fields prefer explainable models, such as random forests with feature contributions that can provide local explanation for each prediction, and Mean Decrease Impurity (MDI) that can provide global feature importance. However, deep forest, as a cascade of random forests, possesses interpretability only at the first layer. From the second layer on, many of the tree splits occur on the new features generated by the previous layer, which makes existing explanatory tools for random forests inapplicable. To disclose the impact of the original features in the deep layers, we design a calculation method with an estimation step followed by a calibration step for each layer, and propose our feature contribution and MDI feature importance calculation tools for deep forest. Experimental results on both simulated data and real world data verify the effectiveness of our methods.


The Combination of Metal Oxides as Oxide Layers for RRAM and Artificial Intelligence

arXiv.org Artificial Intelligence

Resistive random-access memory (RRAM) is a promising candidate for next-generation memory devices due to its high speed, low power consumption, and excellent scalability. Metal oxides are commonly used as the oxide layer in RRAM devices due to their high dielectric constant and stability. However, to further improve the performance of RRAM devices, recent research has focused on integrating artificial intelligence (AI). AI can be used to optimize the performance of RRAM devices, while RRAM can also power AI as a hardware accelerator and in neuromorphic computing. This review paper provides an overview of the combination of metal oxides-based RRAM and AI, highlighting recent advances in these two directions. We discuss the use of AI to improve the performance of RRAM devices and the use of RRAM to power AI. Additionally, we address key challenges in the field and provide insights into future research directions


The SZ flux-mass ($Y$-$M$) relation at low halo masses: improvements with symbolic regression and strong constraints on baryonic feedback

arXiv.org Artificial Intelligence

Feedback from active galactic nuclei (AGN) and supernovae can affect measurements of integrated SZ flux of halos ($Y_\mathrm{SZ}$) from CMB surveys, and cause its relation with the halo mass ($Y_\mathrm{SZ}-M$) to deviate from the self-similar power-law prediction of the virial theorem. We perform a comprehensive study of such deviations using CAMELS, a suite of hydrodynamic simulations with extensive variations in feedback prescriptions. We use a combination of two machine learning tools (random forest and symbolic regression) to search for analogues of the $Y-M$ relation which are more robust to feedback processes for low masses ($M\lesssim 10^{14}\, h^{-1} \, M_\odot$); we find that simply replacing $Y\rightarrow Y(1+M_*/M_\mathrm{gas})$ in the relation makes it remarkably self-similar. This could serve as a robust multiwavelength mass proxy for low-mass clusters and galaxy groups. Our methodology can also be generally useful to improve the domain of validity of other astrophysical scaling relations. We also forecast that measurements of the $Y-M$ relation could provide percent-level constraints on certain combinations of feedback parameters and/or rule out a major part of the parameter space of supernova and AGN feedback models used in current state-of-the-art hydrodynamic simulations. Our results can be useful for using upcoming SZ surveys (e.g., SO, CMB-S4) and galaxy surveys (e.g., DESI and Rubin) to constrain the nature of baryonic feedback. Finally, we find that the an alternative relation, $Y-M_*$, provides complementary information on feedback than $Y-M$


A Generic Approach for Reproducible Model Distillation

arXiv.org Artificial Intelligence

Model distillation has been a popular method for producing interpretable machine learning. It uses an interpretable "student" model to mimic the predictions made by the black box "teacher" model. However, when the student model is sensitive to the variability of the data sets used for training even when keeping the teacher fixed, the corresponded interpretation is not reliable. Existing strategies stabilize model distillation by checking whether a large enough corpus of pseudo-data is generated to reliably reproduce student models, but methods to do so have so far been developed for a specific student model. In this paper, we develop a generic approach for stable model distillation based on central limit theorem for the average loss. We start with a collection of candidate student models and search for candidates that reasonably agree with the teacher. Then we construct a multiple testing framework to select a corpus size such that the consistent student model would be selected under different pseudo samples. We demonstrate the application of our proposed approach on three commonly used intelligible models: decision trees, falling rule lists and symbolic regression. Finally, we conduct simulation experiments on Mammographic Mass and Breast Cancer datasets and illustrate the testing procedure throughout a theoretical analysis with Markov process. The code is publicly available at https://github.com/yunzhe-zhou/GenericDistillation.


Assisting clinical practice with fuzzy probabilistic decision trees

arXiv.org Artificial Intelligence

The need for fully human-understandable models is increasingly being recognised as a central theme in AI research. The acceptance of AI models to assist in decision making in sensitive domains will grow when these models are interpretable, and this trend towards interpretable models will be amplified by upcoming regulations. One of the killer applications of interpretable AI is medical practice, which can benefit from accurate decision support methodologies that inherently generate trust. In this work, we propose FPT, (MedFP), a novel method that combines probabilistic trees and fuzzy logic to assist clinical practice. This approach is fully interpretable as it allows clinicians to generate, control and verify the entire diagnosis procedure; one of the methodology's strength is the capability to decrease the frequency of misdiagnoses by providing an estimate of uncertainties and counterfactuals. Our approach is applied as a proof-of-concept to two real medical scenarios: classifying malignant thyroid nodules and predicting the risk of progression in chronic kidney disease patients. Our results show that probabilistic fuzzy decision trees can provide interpretable support to clinicians, furthermore, introducing fuzzy variables into the probabilistic model brings significant nuances that are lost when using the crisp thresholds set by traditional probabilistic decision trees. We show that FPT and its predictions can assist clinical practice in an intuitive manner, with the use of a user-friendly interface specifically designed for this purpose. Moreover, we discuss the interpretability of the FPT model.