AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Differentiating Viral and Bacterial Infections: A Machine Learning Model Based on Routine Blood Test Values

Gunčar, Gregor, Kukar, Matjaž, Smole, Tim, Moškon, Sašo, Vovko, Tomaž, Podnar, Simon, Černelč, Peter, Brvar, Miran, Notar, Mateja, Köster, Manca, Jelenc, Marjeta Tušek, Notar, Marko

arXiv.org Artificial IntelligenceMay-13-2023

In this study, a Virus vs. Bacteria machine learning model was developed to discern between these infection types using 16 routine blood test results, C-reactive protein levels, biological sex, and age. With a dataset of 44,120 cases from a single medical center, the Virus vs. Bacteria model demonstrated remarkable accuracy of 82.2%, a Brier score of 0.129, and an area under the ROC curve of 0.91, surpassing the performance of traditional CRP decision rule models. The model demonstrates substantially improved accuracy within the CRP range of 10-40 mg/L, an interval in which CRP alone offers limited diagnostic value for distinguishing between bacterial and viral infections. These findings underscore the importance of considering multiple blood parameters for diagnostic decision-making and suggest that the Virus vs. Bacteria model could contribute to the creation of innovative diagnostic tools. Such tools would harness machine learning and relevant biomarkers to support enhanced clinical decision-making in managing infections.

artificial intelligence, infection, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2305.07877

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.05)
North America > United States > New York (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)

Add feedback

Enhancing Robustness of Gradient-Boosted Decision Trees through One-Hot Encoding and Regularization

Cui, Shijie, Sudjianto, Agus, Zhang, Aijun, Li, Runze

arXiv.org Artificial IntelligenceMay-11-2023

Gradient-boosted decision trees (GBDT) are widely used and highly effective machine learning approach for tabular data modeling. However, their complex structure may lead to low robustness against small covariate perturbation in unseen data. In this study, we apply one-hot encoding to convert a GBDT model into a linear framework, through encoding of each tree leaf to one dummy variable. This allows for the use of linear regression techniques, plus a novel risk decomposition for assessing the robustness of a GBDT model against covariate perturbations. We propose to enhance the robustness of GBDT models by refitting their linear regression forms with $L_1$ or $L_2$ regularization. Theoretical results are obtained about the effect of regularization on the model performance and robustness. It is demonstrated through numerical experiments that the proposed regularization approach can enhance the robustness of the one-hot-encoded GBDT models.

artificial intelligence, machine learning, robustness, (17 more...)

arXiv.org Artificial Intelligence

2304.13761

Country:

North America > United States > California (0.04)
North America > United States > Pennsylvania > Centre County > University Park (0.04)
North America > United States > North Carolina > Mecklenburg County > Charlotte (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.89)

Add feedback

Covariance regression with random forests

Alakus, Cansu, Larocque, Denis, Labbe, Aurelie

arXiv.org Machine LearningMay-11-2023

Capturing the conditional covariances or correlations among the elements of a multivariate response vector based on covariates is important to various fields including neuroscience, epidemiology and biomedicine. We propose a new method called Covariance Regression with Random Forests (CovRegRF) to estimate the covariance matrix of a multivariate response given a set of covariates, using a random forest framework. Random forest trees are built with a splitting rule specially designed to maximize the difference between the sample covariance matrix estimates of the child nodes. We also propose a significance test for the partial effect of a subset of covariates. We evaluate the performance of the proposed method and significance test through a simulation study which shows that the proposed method provides accurate covariance matrix estimates and that the Type-1 error is well controlled. An application of the proposed method to thyroid disease data is also presented. CovRegRF is implemented in a freely available R package on CRAN.

artificial intelligence, covariance matrix, machine learning, (17 more...)

arXiv.org Machine Learning

2209.08173

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology (1.00)
Health & Medicine > Therapeutic Area > Internal Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

XMI-ICU: Explainable Machine Learning Model for Pseudo-Dynamic Prediction of Mortality in the ICU for Heart Attack Patients

Mesinovic, Munib, Watkinson, Peter, Zhu, Tingting

arXiv.org Artificial IntelligenceMay-10-2023

Heart attack remain one of the greatest contributors to mortality in the United States and globally. Patients admitted to the intensive care unit (ICU) with diagnosed heart attack (myocardial infarction or MI) are at higher risk of death. In this study, we use two retrospective cohorts extracted from the eICU and MIMIC-IV databases, to develop a novel pseudo-dynamic machine learning framework for mortality prediction in the ICU with interpretability and clinical risk analysis. The method provides accurate prediction for ICU patients up to 24 hours before the event and provide time-resolved interpretability results. The performance of the framework relying on extreme gradient boosting was evaluated on a held-out test set from eICU, and externally validated on the MIMIC-IV cohort using the most important features identified by time-resolved Shapley values achieving AUCs of 91.0 (balanced accuracy of 82.3) for 6-hour prediction of mortality respectively. We show that our framework successfully leverages time-series physiological measurements by translating them into stacked static prediction problems to be robustly predictive through time in the ICU stay and can offer clinical insight from time-resolved interpretability

artificial intelligence, machine learning, prediction, (14 more...)

arXiv.org Artificial Intelligence

2305.06109

Country:

North America > United States (0.34)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Netherlands (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Providers & Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

A Kriging-Random Forest Hybrid Model for Real-time Ground Property Prediction during Earth Pressure Balance Shield Tunneling

Geng, Ziheng, Zhang, Chao, Ren, Yuhao, Zhu, Minxiang, Chen, Renpeng, Cheng, Hongzhan

arXiv.org Artificial IntelligenceMay-8-2023

A kriging-random forest hybrid model is developed for real-time ground property prediction ahead of the earth pressure balanced shield by integrating Kriging extrapolation and random forest, which can guide shield operating parameter selection thereby mitigate construction risks. The proposed KRF algorithm synergizes two types of information: prior information and real-time information. The previously predicted ground properties with EPB operating parameters are extrapolated via the Kriging algorithm to provide prior information for the prediction of currently being excavated ground properties. The real-time information refers to the real-time operating parameters of the EPB shield, which are input into random forest to provide a real-time prediction of ground properties. The integration of these two predictions is achieved by assigning weights to each prediction according to their uncertainties, ensuring the prediction of KRF with minimum uncertainty. The performance of the KRF algorithm is assessed via a case study of the Changsha Metro Line 4 project. It reveals that the proposed KRF algorithm can predict ground properties with an accuracy of 93%, overperforming the existing algorithms of LightGBM, AdaBoost-CART, and DNN by 29%, 8%, and 12%, respectively. Another dataset from Shenzhen Metro Line 13 project is utilized to further evaluate the model generalization performance, revealing that the model can transfer its learned knowledge from one region to another with an accuracy of 89%.

machine learning, prediction, real time system, (19 more...)

arXiv.org Artificial Intelligence

2305.05128

Country: Asia > China > Guangdong Province > Shenzhen (0.25)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
(3 more...)

Add feedback

When a CBR in Hand is Better than Twins in the Bush

Ahmed, Mobyen Uddin, Barua, Shaibal, Begum, Shahina, Islam, Mir Riyanul, Weber, Rosina O

arXiv.org Artificial IntelligenceMay-8-2023

AI methods referred to as interpretable are often discredited as inaccurate by supporters of the existence of a trade-off between interpretability and accuracy. In many problem contexts however this trade-off does not hold. This paper discusses a regression problem context to predict flight take-off delays where the most accurate data regression model was trained via the XGBoost implementation of gradient boosted decision trees. While building an XGB-CBR Twin and converting the XGBoost feature importance into global weights in the CBR model, the resultant CBR model alone provides the most accurate local prediction, maintains the global importance to provide a global explanation of the model, and offers the most interpretable representation for local explanations. This resultant CBR model becomes a benchmark of accuracy and interpretability for this problem context, and hence it is used to evaluate the two additive feature attribute methods SHAP and LIME to explain the XGBoost regression model.

artificial intelligence, machine learning, prediction, (15 more...)

arXiv.org Artificial Intelligence

2305.05111

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Transportation > Air (1.00)
Transportation > Infrastructure & Services (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.56)

Add feedback

Machine Learning Benchmarks for the Classification of Equivalent Circuit Models from Electrochemical Impedance Spectra

Schaeffer, Joachim, Gasper, Paul, Garcia-Tamayo, Esteban, Gasper, Raymond, Adachi, Masaki, Gaviria-Cardona, Juan Pablo, Montoya-Bedoya, Simon, Bhutani, Anoushka, Schiek, Andrew, Goodall, Rhys, Findeisen, Rolf, Braatz, Richard D., Engelke, Simon

arXiv.org Artificial IntelligenceMay-4-2023

Analysis of Electrochemical Impedance Spectroscopy (EIS) data for electrochemical systems often consists of defining an Equivalent Circuit Model (ECM) using expert knowledge and then optimizing the model parameters to deconvolute various resistance, capacitive, inductive, or diffusion responses. For small data sets, this procedure can be conducted manually; however, it is not feasible to manually define a proper ECM for extensive data sets with a wide range of EIS responses. Automatic identification of an ECM would substantially accelerate the analysis of large sets of EIS data. We showcase machine learning methods to classify the ECMs of 9,300 impedance spectra provided by QuantumScape for the BatteryDEV hackathon. The best-performing approach is a gradient-boosted tree model utilizing a library to automatically generate features, followed by a random forest model using the raw spectral data. A convolutional neural network using boolean images of Nyquist representations is presented as an alternative, although it achieves a lower accuracy. We publish the data and open source the associated code. The approaches described in this article can serve as benchmarks for further studies. A key remaining challenge is the identifiability of the labels, underlined by the model performances and the comparison of misclassified spectra.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1149/1945-7111/acd8fb

2302.03362

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
(6 more...)

Genre: Research Report (1.00)

Industry:

Energy > Energy Storage (1.00)
Electrical Industrial Apparatus (1.00)
Government > Regional Government (0.93)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
(2 more...)

Add feedback

Gradient-less Federated Gradient Boosting Trees with Learnable Learning Rates

Ma, Chenyang, Qiu, Xinchi, Beutel, Daniel J., Lane, Nicholas D.

arXiv.org Artificial IntelligenceMay-2-2023

The privacy-sensitive nature of decentralized datasets and the robustness of eXtreme Gradient Boosting (XGBoost) on tabular data raise the needs to train XGBoost in the context of federated learning (FL). Existing works on federated XGBoost in the horizontal setting rely on the sharing of gradients, which induce per-node level communication frequency and serious privacy concerns. To alleviate these problems, we develop an innovative framework for horizontal federated XGBoost which does not depend on the sharing of gradients and simultaneously boosts privacy and communication efficiency by making the learning rates of the aggregated tree ensembles learnable. We conduct extensive evaluations on various classification and regression datasets, showing our approach achieves performance comparable to the state-of-the-art method and effectively improves communication efficiency by lowering both communication rounds and communication overhead by factors ranging from 25x to 700x.

artificial intelligence, machine learning, tree ensemble, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3578356.3592579

2304.07537

Country:

North America > United States > Virginia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Nepal (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Predict NAS Multi-Task by Stacking Ensemble Models using GP-NAS

Zhang, Ke

arXiv.org Artificial IntelligenceMay-2-2023

Accurately predicting the performance of architecture with small sample training is an important but not easy task. How to analysis and train dataset to overcome overfitting is the core problem we should deal with. Meanwhile if there is the mult-task problem, we should also think about if we can take advantage of their correlation and estimate as fast as we can. In this track, Super Network builds a search space based on ViT-Base. The search space contain depth, num-heads, mpl-ratio and embed-dim. What we done firstly are pre-processing the data based on our understanding of this problem which can reduce complexity of problem and probability of over fitting. Then we tried different kind of models and different way to combine them. Finally we choose stacking ensemble models using GP-NAS with cross validation. Our stacking model ranked 1st in CVPR 2022 Track 2 Challenge.

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Artificial Intelligence

2305.01667

Country: North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)

Add feedback

Interpreting Deep Forest through Feature Contribution and MDI Feature Importance

He, Yi-Xiao, Lyu, Shen-Huan, Jiang, Yuan

arXiv.org Artificial IntelligenceMay-1-2023

Deep forest is a non-differentiable deep model which has achieved impressive empirical success across a wide variety of applications, especially on categorical/symbolic or mixed modeling tasks. Many of the application fields prefer explainable models, such as random forests with feature contributions that can provide local explanation for each prediction, and Mean Decrease Impurity (MDI) that can provide global feature importance. However, deep forest, as a cascade of random forests, possesses interpretability only at the first layer. From the second layer on, many of the tree splits occur on the new features generated by the previous layer, which makes existing explanatory tools for random forests inapplicable. To disclose the impact of the original features in the deep layers, we design a calculation method with an estimation step followed by a calibration step for each layer, and propose our feature contribution and MDI feature importance calculation tools for deep forest. Experimental results on both simulated data and real world data verify the effectiveness of our methods.

artificial intelligence, contribution, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2305.00805

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback