AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Effective Predictive Modeling for Emergency Department Visits and Evaluating Exogenous Variables Impact: Using Explainable Meta-learning Gradient Boosting

Neshat, Mehdi, Phipps, Michael, Jha, Nikhil, Khojasteh, Danial, Tong, Michael, Gandomi, Amir

arXiv.org Artificial IntelligenceNov-17-2024

Over an extensive duration, administrators and clinicians have endeavoured to predict Emergency Department (ED) visits with precision, aiming to optimise resource distribution. Despite the proliferation of diverse AI-driven models tailored for precise prognostication, this task persists as a formidable challenge, besieged by constraints such as restrained generalisability, susceptibility to overfitting and underfitting, scalability issues, and complex fine-tuning hyper-parameters. In this study, we introduce a novel Meta-learning Gradient Booster (Meta-ED) approach for precisely forecasting daily ED visits and leveraging a comprehensive dataset of exogenous variables, including socio-demographic characteristics, healthcare service use, chronic diseases, diagnosis, and climate parameters spanning 23 years from Canberra Hospital in ACT, Australia. The proposed Meta-ED consists of four foundational learners-Catboost, Random Forest, Extra Tree, and lightGBoost-alongside a dependable top-level learner, Multi-Layer Perceptron (MLP), by combining the unique capabilities of varied base models (sub-learners). Our study assesses the efficacy of the Meta-ED model through an extensive comparative analysis involving 23 models. The evaluation outcomes reveal a notable superiority of Meta-ED over the other models in accuracy at 85.7% (95% CI ;85.4%, 86.0%) and across a spectrum of 10 evaluation metrics. Notably, when compared with prominent techniques, XGBoost, Random Forest (RF), AdaBoost, LightGBoost, and Extra Tree (ExT), Meta-ED showcases substantial accuracy enhancements of 58.6%, 106.3%, 22.3%, 7.0%, and 15.7%, respectively. Furthermore, incorporating weather-related features demonstrates a 3.25% improvement in the prediction accuracy of visitors' numbers. The encouraging outcomes of our study underscore Meta-ED as a foundation model for the precise prediction of daily ED visitors.

artificial intelligence, machine learning, prediction, (18 more...)

arXiv.org Artificial Intelligence

2411.11275

Country:

Oceania > Australia > Australian Capital Territory > Canberra (0.25)
Asia > Taiwan (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(3 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Promising Solution (0.92)
Research Report > New Finding (0.88)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

An Oversampling-enhanced Multi-class Imbalanced Classification Framework for Patient Health Status Prediction Using Patient-reported Outcomes

Yan, Yang, Chen, Zhong, Xu, Cai, Shen, Xinglei, Shiao, Jay, Einck, John, Chen, Ronald C, Gao, Hao

arXiv.org Artificial IntelligenceNov-16-2024

Patient-reported outcomes (PROs) directly collected from cancer patients being treated with radiation therapy play a vital role in assisting clinicians in counseling patients regarding likely toxicities. Precise prediction and evaluation of symptoms or health status associated with PROs are fundamental to enhancing decision-making and planning for the required services and support as patients transition into survivorship. However, the raw PRO data collected from hospitals exhibits some intrinsic challenges such as incomplete item reports and imbalance patient toxicities. To the end, in this study, we explore various machine learning techniques to predict patient outcomes related to health status such as pain levels and sleep discomfort using PRO datasets from a cancer photon/proton therapy center. Specifically, we deploy six advanced machine learning classifiers -- Random Forest (RF), XGBoost, Gradient Boosting (GB), Support Vector Machine (SVM), Multi-Layer Perceptron with Bagging (MLP-Bagging), and Logistic Regression (LR) -- to tackle a multi-class imbalance classification problem across three prevalent cancer types: head and neck, prostate, and breast cancers. To address the class imbalance issue, we employ an oversampling strategy, adjusting the training set sample sizes through interpolations of in-class neighboring samples, thereby augmenting minority classes without deviating from the original skewed class distribution. Our experimental findings across multiple PRO datasets indicate that the RF and XGB methods achieve robust generalization performance, evidenced by weighted AUC and detailed confusion matrices, in categorizing outcomes as mild, intermediate, and severe post-radiation therapy. These results underscore the models' effectiveness and potential utility in clinical settings.

artificial intelligence, dataset, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2411.10819

Country:

South America > Uruguay > Maldonado > Maldonado (0.04)
North America > United States > Texas > El Paso County > El Paso (0.04)
North America > United States > Missouri > Jackson County > Kansas City (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Semiparametric inference for impulse response functions using double/debiased machine learning

Ballinari, Daniele, Wehrli, Alexander

arXiv.org Machine LearningNov-15-2024

We introduce a double/debiased machine learning (DML) estimator for the impulse response function (IRF) in settings where a time series of interest is subjected to multiple discrete treatments, assigned over time, which can have a causal effect on future outcomes. The proposed estimator can rely on fully nonparametric relations between treatment and outcome variables, opening up the possibility to use flexible machine learning approaches to estimate IRFs. To this end, we extend the theory of DML from an i.i.d. to a time series setting and show that the proposed DML estimator for the IRF is consistent and asymptotically normally distributed at the parametric rate, allowing for semiparametric inference for dynamic effects in a time series setting. The properties of the estimator are validated numerically in finite samples by applying it to learn the IRF in the presence of serial dependence in both the confounder and observation innovation processes. We also illustrate the methodology empirically by applying it to the estimation of the effects of macroeconomic shocks.

bias std, estimator, nuisance function, (14 more...)

arXiv.org Machine Learning

2411.10009

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(4 more...)

Genre: Research Report > New Finding (0.45)

Industry:

Banking & Finance > Economy (1.00)
Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.68)

Add feedback

Interaction Testing in Variation Analysis

Plecko, Drago

arXiv.org Artificial IntelligenceNov-13-2024

Relationships of cause and effect are of prime importance for explaining scientific phenomena. Often, rather than just understanding the effects of causes, researchers also wish to understand how a cause $X$ affects an outcome $Y$ mechanistically -- i.e., what are the causal pathways that are activated between $X$ and $Y$. For analyzing such questions, a range of methods has been developed over decades under the rubric of causal mediation analysis. Traditional mediation analysis focuses on decomposing the average treatment effect (ATE) into direct and indirect effects, and therefore focuses on the ATE as the central quantity. This corresponds to providing explanations for associations in the interventional regime, such as when the treatment $X$ is randomized. Commonly, however, it is of interest to explain associations in the observational regime, and not just in the interventional regime. In this paper, we introduce \text{variation analysis}, an extension of mediation analysis that focuses on the total variation (TV) measure between $X$ and $Y$, written as $\mathrm{E}[Y \mid X=x_1] - \mathrm{E}[Y \mid X=x_0]$. The TV measure encompasses both causal and confounded effects, as opposed to the ATE which only encompasses causal (direct and mediated) variations. In this way, the TV measure is suitable for providing explanations in the natural regime and answering questions such as ``why is $X$ associated with $Y$?''. Our focus is on decomposing the TV measure, in a way that explicitly includes direct, indirect, and confounded variations. Furthermore, we also decompose the TV measure to include interaction terms between these different pathways. Subsequently, interaction testing is introduced, involving hypothesis tests to determine if interaction terms are significantly different from zero. If interactions are not significant, more parsimonious decompositions of the TV measure can be used.

decomposition, interaction, spurious effect, (16 more...)

arXiv.org Artificial Intelligence

2411.08861

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > Greenland (0.04)
(7 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Law (0.88)
Health & Medicine > Therapeutic Area (0.67)
Health & Medicine > Consumer Health (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.45)

Add feedback

An Explainable Machine Learning Approach for Age and Gender Estimation in Living Individuals Using Dental Biometrics

Ali, Mohsin, Raza, Haider, Gan, John Q, Pokhojaev, Ariel, Katz, Matanel, Kosan, Esra, Wahjuningrum, Dian Agustin, Saleh, Omnina, Sarig, Rachel, Chaurasia, Akhilanada

arXiv.org Artificial IntelligenceNov-12-2024

Objectives: Age and gender estimation is crucial for various applications, including forensic investigations and anthropological studies. This research aims to develop a predictive system for age and gender estimation in living individuals, leveraging dental measurements such as Coronal Height (CH), Coronal Pulp Cavity Height (CPCH), and Tooth Coronal Index (TCI). Methods: Machine learning models were employed in our study, including Cat Boost Classifier (Catboost), Gradient Boosting Machine (GBM), Ada Boost Classifier (AdaBoost), Random Forest (RF), eXtreme Gradient Boosting (XGB), Light Gradient Boosting Machine (LGB), and Extra Trees Classifier (ETC), to analyze dental data from 862 living individuals (459 males and 403 females). Specifically, periapical radiographs from six teeth per individual were utilized, including premolars and molars from both maxillary and mandibular. A novel ensemble learning technique was developed, which uses multiple models each tailored to distinct dental metrics, to estimate age and gender accurately. Furthermore, an explainable AI model has been created utilizing SHAP, enabling dental experts to make judicious decisions based on comprehensible insight. Results: The RF and XGB models were particularly effective, yielding the highest F1 score for age and gender estimation. Notably, the XGB model showed a slightly better performance in age estimation, achieving an F1 score of 73.26%. A similar trend for the RF model was also observed in gender estimation, achieving a F1 score of 77.53%. Conclusions: This study marks a significant advancement in dental forensic methods, showcasing the potential of machine learning to automate age and gender estimation processes with improved accuracy.

artificial intelligence, estimation, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2411.08195

Country:

Europe > United Kingdom > England > Essex (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
North America > United States > Maryland > Montgomery County > Bethesda (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Enhanced Credit Score Prediction Using Ensemble Deep Learning Model

Xing, Qianwen, Yu, Chang, Huang, Sining, Zheng, Qi, Mu, Xingyu, Sun, Mengying

arXiv.org Artificial IntelligenceNov-12-2024

In contemporary economic society, credit scores are crucial for every participant. A robust credit evaluation system is essential for the profitability of core businesses such as credit cards, loans, and investments for commercial banks and the financial sector. This paper combines high-performance models like XGBoost and LightGBM, already widely used in modern banking systems, with the powerful TabNet model. We have developed a potent model capable of accurately determining credit score levels by integrating Random Forest, XGBoost, and TabNet, and through the stacking technique in ensemble modeling. This approach surpasses the limitations of single models and significantly advances the precise credit score prediction. In the following sections, we will explain the techniques we used and thoroughly validate our approach by comprehensively comparing a series of metrics such as Precision, Recall, F1, and AUC. By integrating Random Forest, XGBoost, and with the TabNet deep learning architecture, these models complement each other, demonstrating exceptionally strong overall performance.

accuracy, dataset, ensemble model, (10 more...)

arXiv.org Artificial Intelligence

doi: 10.23977/jaip.2024.070316

2410.00256

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
North America > United States > Massachusetts > Suffolk County > Boston (0.05)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry: Banking & Finance > Credit (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Exogenous Randomness Empowering Random Forests

Mei, Tianxing, Fan, Yingying, Lv, Jinchi

arXiv.org Machine LearningNov-12-2024

We offer theoretical and empirical insights into the impact of exogenous randomness on the effectiveness of random forests with tree-building rules independent of training data. We formally introduce the concept of exogenous randomness and identify two types of commonly existing randomness: Type I from feature subsampling, and Type II from tie-breaking in tree-building processes. We develop non-asymptotic expansions for the mean squared error (MSE) for both individual trees and forests and establish sufficient and necessary conditions for their consistency. In the special example of the linear regression model with independent features, our MSE expansions are more explicit, providing more understanding of the random forests' mechanisms. It also allows us to derive an upper bound on the MSE with explicit consistency rates for trees and forests. Guided by our theoretical findings, we conduct simulations to further explore how exogenous randomness enhances random forest performance. Our findings unveil that feature subsampling reduces both the bias and variance of random forests compared to individual trees, serving as an adaptive mechanism to balance bias and variance. Furthermore, our results reveal an intriguing phenomenon: the presence of noise features can act as a "blessing" in enhancing the performance of random forests thanks to feature subsampling.

estimator, exogenous randomness, randomness, (16 more...)

arXiv.org Machine Learning

2411.07554

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > Iceland > Capital Region > Reykjavik (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Enhancing Phishing Detection through Feature Importance Analysis and Explainable AI: A Comparative Study of CatBoost, XGBoost, and EBM Models

Fajar, Abdullah, Yazid, Setiadi, Budi, Indra

arXiv.org Artificial IntelligenceNov-11-2024

Phishing attacks remain a persistent threat to online security, demanding robust detection methods. This study investigates the use of machine learning to identify phishing URLs, emphasizing the crucial role of feature selection and model interpretability for improved performance. Employing Recursive Feature Elimination, the research pinpointed key features like "length_url," "time_domain_activation" and "Page_rank" as strong indicators of phishing attempts. The study evaluated various algorithms, including CatBoost, XGBoost, and Explainable Boosting Machine, assessing their robustness and scalability. XGBoost emerged as highly efficient in terms of runtime, making it well-suited for large datasets. CatBoost, on the other hand, demonstrated resilience by maintaining high accuracy even with reduced features. To enhance transparency and trustworthiness, Explainable AI techniques, such as SHAP, were employed to provide insights into feature importance. The study's findings highlight that effective feature selection and model interpretability can significantly bolster phishing detection systems, paving the way for more efficient and adaptable defenses against evolving cyber threats

accuracy, dataset, detection, (14 more...)

arXiv.org Artificial Intelligence

2411.0686

Country:

Asia > Indonesia > Java > West Java > Depok (0.04)
Asia > Indonesia > Java > West Java > Bandung (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.85)
(3 more...)

Add feedback

Privacy-Preserving Graph-Based Machine Learning with Fully Homomorphic Encryption for Collaborative Anti-Money Laundering

Effendi, Fabrianne, Chattopadhyay, Anupam

arXiv.org Artificial IntelligenceNov-11-2024

Combating money laundering has become increasingly complex with the rise of cybercrime and digitalization of financial transactions. Graph-based machine learning techniques have emerged as promising tools for Anti-Money Laundering (AML) detection, capturing intricate relationships within money laundering networks. However, the effectiveness of AML solutions is hindered by data silos within financial institutions, limiting collaboration and overall efficacy. This research presents a novel privacy-preserving approach for collaborative AML machine learning, facilitating secure data sharing across institutions and borders while preserving privacy and regulatory compliance. Leveraging Fully Homomorphic Encryption (FHE), computations are directly performed on encrypted data, ensuring the confidentiality of financial data. Notably, FHE over the Torus (TFHE) was integrated with graph-based machine learning using Zama Concrete ML. The research contributes two key privacy-preserving pipelines. First, the development of a privacy-preserving Graph Neural Network (GNN) pipeline was explored. Optimization techniques like quantization and pruning were used to render the GNN FHE-compatible. Second, a privacy-preserving graph-based XGBoost pipeline leveraging Graph Feature Preprocessor (GFP) was successfully developed. Experiments demonstrated strong predictive performance, with the XGBoost model consistently achieving over 99% accuracy, F1-score, precision, and recall on the balanced AML dataset in both unencrypted and FHE-encrypted inference settings. On the imbalanced dataset, the incorporation of graph-based features improved the F1-score by 8%. The research highlights the need to balance the trade-off between privacy and computational efficiency.

dataset, graph-based feature, homomorphic encryption, (13 more...)

arXiv.org Artificial Intelligence

2411.02926

Country:

Asia > Singapore (0.15)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
(4 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Law Enforcement & Public Safety > Fraud (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.90)

Add feedback

Stabilized Inverse Probability Weighting via Isotonic Calibration

van der Laan, Lars, Lin, Ziming, Carone, Marco, Luedtke, Alex

arXiv.org Machine LearningNov-9-2024

Inverse weighting with an estimated propensity score is widely used by estimation methods in causal inference to adjust for confounding bias. However, directly inverting propensity score estimates can lead to instability, bias, and excessive variability due to large inverse weights, especially when treatment overlap is limited. In this work, we propose a post-hoc calibration algorithm for inverse propensity weights that generates well-calibrated, stabilized weights from user-supplied, cross-fitted propensity score estimates. Our approach employs a variant of isotonic regression with a loss function specifically tailored to the inverse propensity weights. Through theoretical analysis and empirical studies, we demonstrate that isotonic calibration improves the performance of doubly robust estimators of the average treatment effect.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

2411.06342

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > New York (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback