AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Identification of Prognostic Biomarkers for Stage III Non-Small Cell Lung Carcinoma in Female Nonsmokers Using Machine Learning

Zheng, Huili, Zhang, Qimin, Gong, Yiru, Liu, Zheyan, Chen, Shaohan

arXiv.org Machine LearningAug-29-2024

Lung cancer remains a leading cause of cancer-related deaths globally, with non-small cell lung cancer (NSCLC) being the most common subtype. This study aimed to identify key biomarkers associated with stage III NSCLC in non-smoking females using gene expression profiling from the GDS3837 dataset. Utilizing XGBoost, a machine learning algorithm, the analysis achieved a strong predictive performance with an AUC score of 0.835. The top biomarkers identified - CCAAT enhancer binding protein alpha (C/EBP-alpha), lactate dehydrogenase A4 (LDHA), UNC-45 myosin chaperone B (UNC-45B), checkpoint kinase 1 (CHK1), and hypoxia-inducible factor 1 subunit alpha (HIF-1-alpha) - have been validated in the literature as being significantly linked to lung cancer. These findings highlight the potential of these biomarkers for early diagnosis and personalized therapy, emphasizing the value of integrating machine learning with molecular profiling in cancer research.

biomarker, cancer, lung cancer, (14 more...)

arXiv.org Machine Learning

2408.16068

Country:

North America > United States > New York > New York County > New York City (0.05)
Asia > Taiwan (0.04)

Genre: Research Report > New Finding (0.47)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Oncology > Lung Cancer (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.38)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

Add feedback

Enhancing Intrusion Detection in IoT Environments: An Advanced Ensemble Approach Using Kolmogorov-Arnold Networks

Amouri, Amar, Rahhal, Mohamad Mahmoud Al, Bazi, Yakoub, Butun, Ismail, Mahgoub, Imad

arXiv.org Artificial IntelligenceAug-29-2024

In recent years, the evolution of machine learning techniques has significantly impacted the field of intrusion detection, particularly within the context of the Internet of Things (IoT). As IoT networks expand, the need for robust security measures to counteract potential threats has become increasingly critical. This paper introduces a hybrid Intrusion Detection System (IDS) that synergistically combines Kolmogorov-Arnold Networks (KANs) with the XGBoost algorithm. Our proposed IDS leverages the unique capabilities of KANs, which utilize learnable activation functions to model complex relationships within data, alongside the powerful ensemble learning techniques of XGBoost, known for its high performance in classification tasks. This hybrid approach not only enhances the detection accuracy but also improves the interpretability of the model, making it suitable for dynamic and intricate IoT environments. Experimental evaluations demonstrate that our hybrid IDS achieves an impressive detection accuracy exceeding 99% in distinguishing between benign and malicious activities. Additionally, we were able to achieve F1 scores, precision, and recall that exceeded 98%. Furthermore, we conduct a comparative analysis against traditional Multi-Layer Perceptron (MLP) networks, assessing performance metrics such as Precision, Recall, and F1-score. The results underscore the efficacy of integrating KANs with XGBoost, highlighting the potential of this innovative approach to significantly strengthen the security framework of IoT networks.

dataset, intrusion detection system, kan, (11 more...)

arXiv.org Artificial Intelligence

2408.15886

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
Asia > Middle East > Saudi Arabia > Riyadh Province > Riyadh (0.04)
Asia > India (0.04)

Genre:

Research Report > New Finding (0.49)
Overview > Innovation (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.72)

Add feedback

Machine Learning-Based Research on the Adaptability of Adolescents to Online Education

Wang, Mingwei, Liu, Sitong

arXiv.org Artificial IntelligenceAug-29-2024

With the rapid advancement of internet technology, the adaptability of adolescents to online learning has emerged as a focal point of interest within the educational sphere. However, the academic community's efforts to develop predictive models for adolescent online learning adaptability require further refinement and expansion. Utilizing data from the "Chinese Adolescent Online Education Survey" spanning the years 2014 to 2016, this study implements five machine learning algorithms - logistic regression, K-nearest neighbors, random forest, XGBoost, and CatBoost - to analyze the factors influencing adolescent online learning adaptability and to determine the model best suited for prediction. The research reveals that the duration of courses, the financial status of the family, and age are the primary factors affecting students' adaptability in online learning environments. Additionally, age significantly impacts students' adaptive capacities. Among the predictive models, the random forest, XGBoost, and CatBoost algorithms demonstrate superior forecasting capabilities, with the random forest model being particularly adept at capturing the characteristics of students' adaptability.

adaptability, online education, student, (12 more...)

arXiv.org Artificial Intelligence

2408.16849

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre:

Research Report > New Finding (0.51)
Research Report > Experimental Study (0.36)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.68)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)

Add feedback

Ionospheric Scintillation Forecasting Using Machine Learning

Halawa, Sultan, Alansaari, Maryam, Sharif, Maryam, Alhammadi, Amel, Fernini, Ilias

arXiv.org Machine LearningAug-28-2024

This study explores the use of historical data from Global Navigation Satellite System (GNSS) scintillation monitoring receivers to predict the severity of amplitude scintillation, a phenomenon where electron density irregularities in the ionosphere cause fluctuations in GNSS signal power. These fluctuations can be measured using the S4 index, but real-time data is not always available. The research focuses on developing a machine learning (ML) model that can forecast the intensity of amplitude scintillation, categorizing it into low, medium, or high severity levels based on various time and space-related factors. Among six different ML models tested, the XGBoost model emerged as the most effective, demonstrating a remarkable 77% prediction accuracy when trained with a balanced dataset. This work underscores the effectiveness of machine learning in enhancing the reliability and performance of GNSS signals and navigation systems by accurately predicting amplitude scintillation severity.

artificial intelligence, machine learning, scintillation, (15 more...)

arXiv.org Machine Learning

2409.00118

Country:

North America > United States (0.14)
Asia > Middle East > UAE > Sharjah Emirate > Sharjah (0.06)
South America > Brazil (0.05)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.93)

Add feedback

Scaling Up Diffusion and Flow-based XGBoost Models

Cresswell, Jesse C., Kim, Taewoo

arXiv.org Artificial IntelligenceAug-28-2024

Novel machine learning methods for tabular data generation are often developed on small datasets which do not match the scale required for scientific applications. We investigate a recent proposal to use XGBoost as the function approximator in diffusion and flow-matching models on tabular data, which proved to be extremely memory intensive, even on tiny datasets. In this work, we conduct a critical analysis of the existing implementation from an engineering perspective, and show that these limitations are not fundamental to the method; with better implementation it can be scaled to datasets 370x larger than previously used. Our efficient implementation also unlocks scaling models to much larger sizes which we show directly leads to improved performance on benchmark tasks. We also propose algorithmic improvements that can further benefit resource usage and model performance, including multi-output trees which are well-suited to generative modeling. Finally, we present results on large-scale scientific datasets derived from experimental particle physics as part of the Fast Calorimeter Simulation Challenge. Code is available at https://github.com/layer6ai-labs/calo-forest.

diffusion and flow-based xgboost model, scaling

arXiv.org Artificial Intelligence

2408.16046

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.60)

Add feedback

A nudge to the truth: atom conservation as a hard constraint in models of atmospheric composition using an uncertainty-weighted correction

Sturm, Patrick Obin, Silva, Sam J.

arXiv.org Artificial IntelligenceAug-28-2024

Computational models of atmospheric composition are not always physically consistent. For example, not all models respect fundamental conservation laws such as conservation of atoms in an interconnected chemical system. In well performing models, these nonphysical deviations are often ignored because they are frequently minor, and thus only need a small nudge to perfectly conserve mass. Here we introduce a method that anchors a prediction from any numerical model to physically consistent hard constraints, nudging concentrations to the nearest solution that respects the conservation laws. This closed-form model-agnostic correction uses a single matrix operation to minimally perturb the predicted concentrations to ensure that atoms are conserved to machine precision. To demonstrate this approach, we train a gradient boosting decision tree ensemble to emulate a small reference model of ozone photochemistry and test the effect of the correction on accurate but non-conservative predictions. The nudging approach minimally perturbs the already well-predicted results for most species, but decreases the accuracy of important oxidants, including radicals. We develop a weighted extension of this nudging approach that considers the uncertainty and magnitude of each species in the correction. This species-level weighting approach is essential to accurately predict important low concentration species such as radicals. We find that applying the uncertainty-weighted correction to the nonphysical predictions slightly improves overall accuracy, by nudging the predictions to a more likely mass-conserving solution.

conservation, constraint, prediction, (14 more...)

arXiv.org Artificial Intelligence

2408.16109

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > United Kingdom > Wales (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry:

Materials > Chemicals (1.00)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.89)

Add feedback

Free Lunch in the Forest: Functionally-Identical Pruning of Boosted Tree Ensembles

Emine, Youssouf, Forel, Alexandre, Malek, Idriss, Vidal, Thibaut

arXiv.org Artificial IntelligenceAug-28-2024

Tree ensembles, including boosting methods, are highly effective and widely used for tabular data. However, large ensembles lack interpretability and require longer inference times. We introduce a method to prune a tree ensemble into a reduced version that is "functionally identical" to the original model. In other words, our method guarantees that the prediction function stays unchanged for any possible input. As a consequence, this pruning algorithm is lossless for any aggregated metric. We formalize the problem of functionally identical pruning on ensembles, introduce an exact optimization model, and provide a fast yet highly effective method to prune large ensembles. Our algorithm iteratively prunes considering a finite set of points, which is incrementally augmented using an adversarial model. In multiple computational experiments, we show that our approach is a "free lunch", significantly reducing the ensemble size without altering the model's behavior. Thus, we can preserve state-of-the-art performance at a fraction of the original model's size.

ensemble, original ensemble, tree ensemble, (14 more...)

arXiv.org Artificial Intelligence

2408.16167

Country:

North America > United States > Wisconsin (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

A Comprehensive Benchmark of Machine and Deep Learning Across Diverse Tabular Datasets

Shmuel, Assaf, Glickman, Oren, Lazebnik, Teddy

arXiv.org Artificial IntelligenceAug-27-2024

The analysis of tabular datasets is highly prevalent both in scientific research and real-world applications of Machine Learning (ML). Unlike many other ML tasks, Deep Learning (DL) models often do not outperform traditional methods in this area. Previous comparative benchmarks have shown that DL performance is frequently equivalent or even inferior to models such as Gradient Boosting Machines (GBMs). In this study, we introduce a comprehensive benchmark aimed at better characterizing the types of datasets where DL models excel. Although several important benchmarks for tabular datasets already exist, our contribution lies in the variety and depth of our comparison: we evaluate 111 datasets with 20 different models, including both regression and classification tasks. These datasets vary in scale and include both those with and without categorical variables. Importantly, our benchmark contains a sufficient number of datasets where DL models perform best, allowing for a thorough analysis of the conditions under which DL models excel. Building on the results of this benchmark, we train a model that predicts scenarios where DL models outperform alternative methods with 86.1% accuracy (AUC 0.78). We present insights derived from this characterization and compare these findings to previous benchmarks.

best average rank median rank, dataset, dl model, (11 more...)

arXiv.org Artificial Intelligence

2408.14817

Country: Asia > Middle East > Israel (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Automated Machine Learning in Insurance

Dong, Panyi, Quan, Zhiyu

arXiv.org Artificial IntelligenceAug-26-2024

Machine Learning (ML), as described by Mitchell et al. (1990), is a multidisciplinary subfield of Artificial Intelligence (AI) focused on developing and implementing algorithms and statistical models that enable computer systems to perform data-driven tasks or make predictions through "leveraging data" and iterative learning processes. This data-driven approach guides the design of ML algorithms, allowing them to grasp the distributions and structures within datasets and unveil correlations that elude traditional mathematical and statistical methods. Professionals in data-related fields, such as data scientists and ML engineers, can engage in autonomous decision-making based on data and benefit from cutting-edge predictions generated by modern ML models. In recent decades, ML has significantly reshaped various industries and gained widespread popularity in academia due to its exceptional predictive capabilities. As summarized by Jordan and Mitchell (2015), ML has made significant contributions in various fields, including robotics, autonomous driving, language processing, and computer vision. The medical and healthcare industry, as suggested by Kononenko (2001) and Qayyum et al. (2020), is increasingly adopting ML for applications such as medical image analysis and clinical treatments. Furthermore, ML models have significantly improved personalization and targeting, marketing strategy, and customer engagement in the marketing sector, as summarized by Ma and Sun (2020). Guerra and Castelli (2021) present the ML innovations in the banking sector, particularly in the analysis of liquidity risks, bank risks, and credit risks. Additionally, there is a growing trend in adopting ML models in the insurance sector and among actuarial researchers and industry practitioners, as evidenced by recent literature.

automl, dataset, pipeline, (15 more...)

arXiv.org Artificial Intelligence

2408.14331

Country:

Asia > Middle East > Jordan (0.24)
North America > United States > Wisconsin (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
(5 more...)

Genre:

Overview (0.92)
Research Report > New Finding (0.67)

Industry:

Banking & Finance > Insurance (1.00)
Banking & Finance > Risk Management (0.88)
Transportation > Ground > Road (0.48)
Health & Medicine > Diagnostic Medicine > Imaging (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Augmented Functional Random Forests: Classifier Construction and Unbiased Functional Principal Components Importance through Ad-Hoc Conditional Permutations

Maturo, Fabrizio, Porreca, Annamaria

arXiv.org Machine LearningAug-23-2024

This paper introduces a novel supervised classification strategy that integrates functional data analysis (FDA) with tree-based methods, addressing the challenges of high-dimensional data and enhancing the classification performance of existing functional classifiers. Specifically, we propose augmented versions of functional classification trees and functional random forests, incorporating a new tool for assessing the importance of functional principal components. This tool provides an ad-hoc method for determining unbiased permutation feature importance in functional data, particularly when dealing with correlated features derived from successive derivatives. Our study demonstrates that these additional features can significantly enhance the predictive power of functional classifiers. Experimental evaluations on both real-world and simulated datasets showcase the effectiveness of the proposed methodology, yielding promising results compared to existing methods.

afct, derivative, functional data, (14 more...)

arXiv.org Machine Learning

2408.13179

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Italy (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.62)

Add feedback