AITopics

2501.01117

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > Canada > Ontario > National Capital Region > Ottawa (0.04)
(6 more...)

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

Brown, Meredith G. L., Peterson, Matt, Tezaur, Irina, Peterson, Kara, Bull, Diana

Random Forest Regression Feature Importance for Climate Impact Pathway Detection

arXiv.org Artificial IntelligenceDec-25-2024

Disturbances to the climate system, both natural and anthropogenic, have far reaching impacts that are not always easy to identify or quantify using traditional climate science analyses or causal modeling techniques. In this paper, we develop a novel technique for discovering and ranking the chain of spatio-temporal downstream impacts of a climate source, referred to herein as a source-impact pathway, using Random Forest Regression (RFR) and SHapley Additive exPlanation (SHAP) feature importances. Rather than utilizing RFR for classification or regression tasks (the most common use case for RFR), we propose a fundamentally new workflow in which we: (i) train random forest (RF) regressors on a set of spatio-temporal features of interest, (ii) calculate their pair-wise feature importances using the SHAP weights associated with those features, and (iii) translate these feature importances into a weighted pathway network (i.e., a weighted directed graph), which can be used to trace out and rank interdependencies between climate features and/or modalities. Importantly, while herein we employ RFR and SHAP feature importance in steps (i) and (ii) of our algorithm, our novel workflow is in no way tied to these approaches, which could be replaced with any regression method and sensitivity method. We adopt a tiered verification approach to verify our new pathway identification methodology. In this approach, we apply our method to ensembles of data generated by running two increasingly complex benchmarks: (i) a set of synthetic coupled equations, and (ii) a fully coupled simulation of the 1991 eruption of Mount Pinatubo in the Philippines performed using a modified version 2 of the U.S. Department of Energy's Energy Exascale Earth System Model (E3SMv2). We find that our RFR feature importance-based approach can accurately detect known pathways of impact for both test cases.

artificial intelligence, machine learning, pathway, (14 more...)

2409.16609

Country: North America > United States (1.00)

Genre:

Workflow (1.00)
Research Report > Promising Solution (0.34)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.90)

F., Paulo C. Marques, Artes, Rinaldo, Graziadei, Helton

Projected random forests and conformal prediction of circular data

arXiv.org Machine LearningDec-25-2024

We apply split conformal prediction techniques to regression problems with circular responses by introducing a suitable conformity score, leading to prediction sets with adaptive arc length and finite-sample coverage guarantees for any circular predictive model under exchangeable data. Leveraging the high performance of existing predictive models designed for linear responses, we analyze a general projection procedure that converts any linear response regression model into one suitable for circular responses. When random forests serve as basis models in this projection procedure, we harness the out-of-bag dynamics to eliminate the necessity for a separate calibration sample in the construction of prediction sets. For synthetic and real datasets the resulting projected random forests model produces more efficient out-of-bag conformal prediction sets, with shorter median arc length, when compared to the split conformal prediction sets generated by two existing alternative models.

artificial intelligence, machine learning, prediction, (19 more...)

arXiv.org Machine Learning

2410.24145

Country:

South America > Brazil (0.29)
Europe > Austria (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Cai, Mingyang, Klausch, Thomas, van de Wiel, Mark A.

A Semi-supervised CART Model for Covariate Shift

arXiv.org Artificial IntelligenceDec-22-2024

Machine learning models used in medical applications often face challenges due to the covariate shift, which occurs when there are discrepancies between the distributions of training and target data. This can lead to decreased predictive accuracy, especially with unknown outcomes in the target data. This paper introduces a semi-supervised classification and regression tree (CART) that uses importance weighting to address these distribution discrepancies. Our method improves the predictive performance of the CART model by assigning greater weights to training samples that more accurately represent the target distribution, especially in cases of covariate shift without target outcomes. In addition to CART, we extend this weighted approach to generalized linear model trees and tree ensembles, creating a versatile framework for managing the covariate shift in complex datasets. Through simulation studies and applications to real-world medical data, we demonstrate significant improvements in predictive accuracy. These findings suggest that our weighted approach can enhance reliability in medical applications and other fields where the covariate shift poses challenges to model performance across various data distributions.

artificial intelligence, decision tree learning, machine learning, (16 more...)

2410.20978

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.93)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.93)
Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Żychowski, Adam, Perrault, Andrew, Mańdziuk, Jacek

Cultivating Archipelago of Forests: Evolving Robust Decision Trees through Island Coevolution

Decision trees are widely used in machine learning due to their simplicity and interpretability, but they often lack robustness to adversarial attacks and data perturbations. The paper proposes a novel island-based coevolutionary algorithm (ICoEvoRDF) for constructing robust decision tree ensembles. The algorithm operates on multiple islands, each containing populations of decision trees and adversarial perturbations. The populations on each island evolve independently, with periodic migration of top-performing decision trees between islands. This approach fosters diversity and enhances the exploration of the solution space, leading to more robust and accurate decision tree ensembles. ICoEvoRDF utilizes a popular game theory concept of mixed Nash equilibrium for ensemble weighting, which further leads to improvement in results. ICoEvoRDF is evaluated on 20 benchmark datasets, demonstrating its superior performance compared to state-of-the-art methods in optimizing both adversarial accuracy and minimax regret. The flexibility of ICoEvoRDF allows for the integration of decision trees from various existing methods, providing a unified framework for combining diverse solutions. Our approach offers a promising direction for developing robust and interpretable machine learning models

artificial intelligence, island, machine learning, (15 more...)

2412.13762

Country:

Europe > Poland > Lesser Poland Province > Kraków (0.04)
North America > United States > Ohio (0.04)
North America > United States > Michigan (0.04)
(2 more...)

Genre:

Research Report > Promising Solution (0.66)
Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Industry:

Government (0.34)
Leisure & Entertainment > Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Ayllón-Gavilán, Rafael, Martínez-Estudillo, Francisco José, Guijo-Rubio, David, Hervás-Martínez, César, Gutiérrez, Pedro Antonio

Splitting criteria for ordinal decision trees: an experimental study

Ordinal Classification (OC) is a machine learning field that addresses classification tasks where the labels exhibit a natural order. Unlike nominal classification, which treats all classes as equally distinct, OC takes the ordinal relationship into account, producing more accurate and relevant results. This is particularly critical in applications where the magnitude of classification errors has implications. Despite this, OC problems are often tackled using nominal methods, leading to suboptimal solutions. Although decision trees are one of the most popular classification approaches, ordinal tree-based approaches have received less attention when compared to other classifiers. This work conducts an experimental study of tree-based methodologies specifically designed to capture ordinal relationships. A comprehensive survey of ordinal splitting criteria is provided, standardising the notations used in the literature for clarity. Three ordinal splitting criteria, Ordinal Gini (OGini), Weighted Information Gain (WIG), and Ranking Impurity (RI), are compared to the nominal counterparts of the first two (Gini and information gain), by incorporating them into a decision tree classifier. An extensive repository considering 45 publicly available OC datasets is presented, supporting the first experimental comparison of ordinal and nominal splitting criteria using well-known OC evaluation metrics. Statistical analysis of the results highlights OGini as the most effective ordinal splitting criterion to date. Source code, datasets, and results are made available to the research community.

artificial intelligence, machine learning, splitting criteria, (17 more...)

2412.13697

Country:

Europe > Spain > Andalusia > Córdoba Province > Córdoba (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > California > Monterey County > Monterey (0.04)
(4 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.85)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

The Certainty Ratio $C_\rho$: a novel metric for assessing the reliability of classifier predictions

Aguilar-Ruiz, Jesus S.

Evaluating the performance of classifiers is critical in machine learning, particularly in high-stakes applications where the reliability of predictions can significantly impact decision-making. Traditional performance measures, such as accuracy and F-score, often fail to account for the uncertainty inherent in classifier predictions, leading to potentially misleading assessments. This paper introduces the Certainty Ratio ($C_\rho$), a novel metric designed to quantify the contribution of confident (certain) versus uncertain predictions to any classification performance measure. By integrating the Probabilistic Confusion Matrix ($CM^\star$) and decomposing predictions into certainty and uncertainty components, $C_\rho$ provides a more comprehensive evaluation of classifier reliability. Experimental results across 21 datasets and multiple classifiers, including Decision Trees, Naive-Bayes, 3-Nearest Neighbors, and Random Forests, demonstrate that $C_\rho$ reveals critical insights that conventional metrics often overlook. These findings emphasize the importance of incorporating probabilistic information into classifier evaluation, offering a robust tool for researchers and practitioners seeking to improve model trustworthiness in complex environments.

artificial intelligence, decision tree learning, machine learning, (5 more...)

2411.01973

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.53)

Mastering AI: Big Data, Deep Learning, and the Evolution of Large Language Models -- AutoML from Basics to State-of-the-Art Techniques

Feng, Pohsun, Bi, Ziqian, Wen, Yizhu, Peng, Benji, Liu, Junyu, Yin, Caitlyn Heqi, Wang, Tianyang, Chen, Keyu, Zhang, Sen, Li, Ming, Xu, Jiawei, Liu, Ming, Pan, Xuanhe, Wang, Jinlang, Niu, Qian

In recent years, Artificial Intelligence (AI) and Machine Learning (ML) have grown tremendously in popularity across various industries. From healthcare and finance to retail and automotive, adopting machine learning models has led to significant advancements[1]. However, building machine learning models traditionally requires deep knowledge in multiple areas, such as data preprocessing, feature engineering, model selection, hyperparameter tuning, and evaluation[2]. For many beginners and even experienced practitioners, this process can be time-consuming and technically challenging. This is where AutoML (Automated Machine Learning) comes in. AutoML simplifies the process of building machine learning models by automating many of the steps that would otherwise require manual intervention [3]. AutoML tools can automatically preprocess data, select the most suitable algorithms, and fine-tune hyperparameters to produce highly accurate models [4]. This automation not only speeds up the model development cycle but also allows users without deep knowledge of machine learning to create models with comparable performance to those made by experienced data scientists.

artificial intelligence, classification task, machine learning, (20 more...)

2410.09596

Country:

North America > United States > California (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(9 more...)

Genre:

Workflow (1.00)
Overview (1.00)
Instructional Material (0.92)
Research Report > Promising Solution (0.50)

Industry:

Banking & Finance > Trading (0.92)
Information Technology > Services (0.67)
Health & Medicine > Diagnostic Medicine > Imaging (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Phelps, Nathan, Lizotte, Daniel J., Woolford, Douglas G.

Challenges learning from imbalanced data using tree-based models: Prevalence estimates systematically depend on hyperparameters and can be upwardly biased

arXiv.org Machine LearningDec-17-2024

Imbalanced binary classification problems arise in many fields of study. When using machine learning models for these problems, it is common to subsample the majority class (i.e., undersampling) to create a (more) balanced dataset for model training. This biases the model's predictions because the model learns from a dataset that does not follow the same data generating process as new data. One way of accounting for this bias is to analytically map the resulting predictions to new values based on the sampling rate for the majority class, which was used to create the training dataset. While this approach may work well for some machine learning models, we have found that calibrating a random forest this way has unintended negative consequences, including prevalence estimates that can be upwardly biased. These prevalence estimates depend on both i) the number of predictors considered at each split in the random forest; and ii) the sampling rate used. We explain the former using known properties of random forests and analytical calibration. However, in investigating the latter issue, we made a surprising discovery - contrary to the widespread belief that decision trees are biased towards the majority class, they actually can be biased towards the minority class.

artificial intelligence, machine learning, prediction, (18 more...)

arXiv.org Machine Learning

2412.16209

Country:

North America > Canada > Ontario > Middlesex County > London (0.14)
North America > Canada > Alberta (0.05)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Love, Peter . E. D., Matthews, Jane, Fang, Weili, Mahamivanan, Hadi

Integrating Evidence into the Design of XAI and AI-based Decision Support Systems: A Means-End Framework for End-users in Construction

arXiv.org Artificial IntelligenceDec-17-2024

A narrative review is used to develop a theoretical evidence-based means-end framework to build an epistemic foundation to uphold explainable artificial intelligence instruments so that the reliability of outcomes generated from decision support systems can be assured and better explained to end-users. The implications of adopting an evidence-based approach to designing decision support systems in construction are discussed with emphasis placed on evaluating the strength, value, and utility of evidence needed to develop meaningful human explanations for end-users. While the developed means-end framework is focused on end-users, stakeholders can also utilize it to create meaningful human explanations. However, they will vary due to their different epistemic goals. Including evidence in the design and development of explainable artificial intelligence and decision support systems will improve decision-making effectiveness, enabling end-users' epistemic goals to be achieved. The proposed means-end framework is developed from a broad spectrum of literature. Thus, it is suggested that it can be used in construction and other engineering domains where there is a need to integrate evidence into the design of explainable artificial intelligence and decision support systems.

decision support system, explanation, machine learning, (15 more...)

2412.14209

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York (0.04)
Oceania > Australia > Western Australia > Perth (0.04)
(13 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Construction & Engineering (1.00)
Information Technology > Security & Privacy (0.93)
Materials > Construction Materials (0.67)
(2 more...)

Technology:

Information Technology > Decision Support Systems (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(5 more...)