AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Analisis Eksploratif Dan Augmentasi Data NSL-KDD Menggunakan Deep Generative Adversarial Networks Untuk Meningkatkan Performa Algoritma Extreme Gradient Boosting Dalam Klasifikasi Jenis Serangan Siber

Santoso, K. P., Madany, F. A., Suryotrisongko, H.

arXiv.org Artificial IntelligenceDec-17-2023

This study proposes the implementation of Deep Generative Adversarial Networks (GANs) for augmenting the NSL-KDD dataset. The primary objective is to enhance the efficacy of eXtreme Gradient Boosting (XGBoost) in the classification of cyber-attacks on the NSL-KDD dataset. As a result, the method proposed in this research achieved an accuracy of 99.53% using the XGBoost model without data augmentation with GAN, and 99.78% with data augmentation using GAN.

adversarial network, serangan, siber, (11 more...)

arXiv.org Artificial Intelligence

2312.10669

Country:

Asia > Indonesia > Java > East Java > Surabaya (0.05)
Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)

Genre: Research Report (0.50)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

Prediction of Crash Injury Severity in Florida's Interstate-95

Anik, B M Tazbiul Hassan, Rashid, Md Mobasshir, Ahsan, Md Jamil

arXiv.org Artificial IntelligenceDec-16-2023

Drivers can sustain serious injuries in traffic accidents. In this study, traffic crashes on Florida's Interstate-95 from 2016 to 2021 were gathered, and several classification methods were used to estimate the severity of driver injuries. In the feature selection method, logistic regression was applied. To compare model performances, various model assessment matrices such as accuracy, recall, and area under curve (AUC) were developed. The Adaboost algorithm outperformed the others in terms of recall and AUC. SHAP values were also generated to explain the classification model's results. This analytical study can be used to examine factors that contribute to the severity of driver injuries in crashes.

injury, prediction, severity, (14 more...)

arXiv.org Artificial Intelligence

2312.12459

Country:

North America > United States > Florida > Orange County > Orlando (0.15)
Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
North America > United States > Georgia (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.71)

Industry:

Transportation > Ground > Road (0.94)
Health & Medicine (0.94)
Transportation > Infrastructure & Services (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Random Forest Variable Importance-based Selection Algorithm in Class Imbalance Problem

Nam, Yunbi, Han, Sunwoo

arXiv.org Machine LearningDec-16-2023

Random Forest is a machine learning method that offers many advantages, including the ability to easily measure variable importance. Class balancing technique is a well-known solution to deal with class imbalance problem. However, it has not been actively studied on RF variable importance. In this paper, we study the effect of class balancing on RF variable importance. Our simulation results show that over-sampling is effective in correctly measuring variable importance in class imbalanced situations with small sample size, while under-sampling fails to differentiate important and non-informative variables. We then propose a variable selection algorithm that utilizes RF variable importance and its confidence interval. Through an experimental study using many real and artificial datasets, we demonstrate that our proposed algorithm efficiently selects an optimal feature set, leading to improved prediction performance in class imbalance problem.

artificial intelligence, decision tree learning, machine learning, (17 more...)

arXiv.org Machine Learning

2312.10573

Country:

South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.69)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.63)

Add feedback

FuXi-S2S: An accurate machine learning model for global subseasonal forecasts

Chen, Lei, Zhong, Xiaohui, Wu, Jie, Chen, Deliang, Xie, Shangping, Chao, Qingchen, Lin, Chensen, Hu, Zixin, Lu, Bo, Li, Hao, Qi, Yuan

arXiv.org Artificial IntelligenceDec-15-2023

Skillful subseasonal forecasts beyond 2 weeks are crucial for a wide range of applications across various sectors of society. Recently, state-of-the-art machine learning based weather forecasting models have made significant advancements, outperforming the high-resolution forecast (HRES) from the European Centre for Medium-Range Weather Forecasts (ECMWF). However, the full potential of machine learning models in subseasonal forecasts has yet to be fully explored. In this study, we introduce FuXi Subseasonal-to-Seasonal (FuXi-S2S), a machine learning based subseasonal forecasting model that provides global daily mean forecasts up to 42 days, covering 5 upper-air atmospheric variables at 13 pressure levels and 11 surface variables. FuXi-S2S integrates an enhanced FuXi base model with a perturbation module for flow-dependent perturbations in hidden features, and incorporates Perlin noise to perturb initial conditions. The model is developed using 72 years of daily statistics from ECMWF ERA5 reanalysis data. When compared to the ECMWF Subseasonal-to-Seasonal (S2S) reforecasts, the FuXi-S2S forecasts demonstrate superior deterministic and ensemble forecasts for total precipitation (TP), outgoing longwave radiation (OLR), and geopotential at 500 hPa (Z500). Although it shows slightly inferior performance in predicting 2-meter temperature (T2M), it has clear advantages over land area. Regarding the extreme forecasts, FuXi-S2S outperforms ECMWF S2S globally for TP. Furthermore, FuXi-S2S forecasts surpass the ECMWF S2S reforecasts in predicting the Madden Julian Oscillation (MJO), a key source of subseasonal predictability. They extend the skillful prediction of MJO from 30 days to 36 days.

forecast, fuxi-s2s, prediction, (17 more...)

arXiv.org Artificial Intelligence

2312.09926

Country:

Asia > Pakistan (0.05)
Asia > East Asia (0.05)
Asia > China > Shanghai > Shanghai (0.04)
(9 more...)

Genre: Research Report > New Finding (1.00)

Industry: Government > Regional Government > North America Government > United States Government (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.46)

Add feedback

Context-Aware Stress Monitoring using Wearable and Mobile Technologies in Everyday Settings

Aqajari, Seyed Amir Hossein, Labbaf, Sina, Tran, Phuc Hoang, Nguyen, Brenda, Mehrabadi, Milad Asgari, Levorato, Marco, Dutt, Nikil, Rahmani, Amir M.

arXiv.org Artificial IntelligenceDec-14-2023

Daily monitoring of stress is a critical component of maintaining optimal physical and mental health. Physiological signals and contextual information have recently emerged as promising indicators for detecting instances of heightened stress. Nonetheless, developing a real-time monitoring system that utilizes both physiological and contextual data to anticipate stress levels in everyday settings while also gathering stress labels from participants represents a significant challenge. We present a monitoring system that objectively tracks daily stress levels by utilizing both physiological and contextual data in a daily-life environment. Additionally, we have integrated a smart labeling approach to optimize the ecological momentary assessment (EMA) collection, which is required for building machine learning models for stress detection. We propose a three-tier Internet-of-Things-based system architecture to address the challenges. We utilized a cross-validation technique to accurately estimate the performance of our stress models. We achieved the F1-score of 70\% with a Random Forest classifier using both PPG and contextual data, which is considered an acceptable score in models built for everyday settings. Whereas using PPG data alone, the highest F1-score achieved is approximately 56\%, emphasizing the significance of incorporating both PPG and contextual data in stress detection tasks.

algorithm, contextual data, participant, (15 more...)

arXiv.org Artificial Intelligence

2401.05367

Country:

North America > United States > California > Orange County > Irvine (0.15)
Europe > Hungary > Budapest > Budapest (0.04)
Asia > Nepal (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.67)

Add feedback

PySCIPOpt-ML: Embedding Trained Machine Learning Models into Mixed-Integer Programs

Turner, Mark, Chmiela, Antonia, Koch, Thorsten, Winkler, Michael

arXiv.org Artificial IntelligenceDec-13-2023

A standard tool for modelling real-world optimisation problems is mixed-integer programming (MIP). However, for many of these problems there is either incomplete information describing variable relations, or the relations between variables are highly complex. To overcome both these hurdles, machine learning (ML) models are often used and embedded in the MIP as surrogate models to represent these relations. Due to the large amount of available ML frameworks, formulating ML models into MIPs is highly non-trivial. In this paper we propose a tool for the automatic MIP formulation of trained ML models, allowing easy integration of ML constraints into MIPs. In addition, we introduce a library of MIP instances with embedded ML constraints. The project is available at https://github.com/Opt-Mucca/PySCIPOpt-ML.

artificial intelligence, machine learning, predictor, (16 more...)

arXiv.org Artificial Intelligence

2312.08074

Country: Europe > Germany (0.46)

Genre: Research Report (0.51)

Industry:

Automobiles & Trucks (0.70)
Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Add feedback

MotherNet: A Foundational Hypernetwork for Tabular Classification

Müller, Andreas, Curino, Carlo, Ramakrishnan, Raghu

arXiv.org Artificial IntelligenceDec-13-2023

The advent of Foundation Models is transforming machine learning across many modalities (e.g., language, images, videos) with prompt engineering replacing training in many settings. Recent work on tabular data (e.g., TabPFN) hints at a similar opportunity to build Foundation Models for classification for numerical data. In this paper, we go one step further and propose a hypernetwork architecture that we call MotherNet, trained on millions of classification tasks, that, once prompted with a never-seen-before training set generates the weights of a trained ``child'' neural-network. Like other Foundation Models, MotherNet replaces training on specific datasets with in-context learning through a single forward pass. In contrast to existing hypernetworks that were either task-specific or trained for relatively constraint multi-task settings, MotherNet is trained to generate networks to perform multiclass classification on arbitrary tabular datasets without any dataset specific gradient descent. The child network generated by MotherNet using in-context learning outperforms neural networks trained using gradient descent on small datasets, and is competitive with predictions by TabPFN and standard ML methods like Gradient Boosting. Unlike a direct application of transformer models like TabPFN, MotherNet generated networks are highly efficient at inference time. This methodology opens up a new approach to building predictive models on tabular data that is both efficient and robust, without any dataset-specific training.

dataset, mothernet, tabpfn, (12 more...)

arXiv.org Artificial Intelligence

2312.08598

Country:

North America > United States > Wisconsin (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

Transferable Adversarial Robustness for Categorical Data via Universal Robust Embeddings

Kireev, Klim, Andriushchenko, Maksym, Troncoso, Carmela, Flammarion, Nicolas

arXiv.org Artificial IntelligenceDec-13-2023

Research on adversarial robustness is primarily focused on image and text data. Yet, many scenarios in which lack of robustness can result in serious risks, such as fraud detection, medical diagnosis, or recommender systems often do not rely on images or text but instead on tabular data. Adversarial robustness in tabular data poses two serious challenges. First, tabular datasets often contain categorical features, and therefore cannot be tackled directly with existing optimization procedures. Second, in the tabular domain, algorithms that are not based on deep networks are widely used and offer great performance, but algorithms to enhance robustness are tailored to neural networks (e.g. adversarial training). In this paper, we tackle both challenges. We present a method that allows us to train adversarially robust deep networks for tabular data and to transfer this robustness to other classifiers via universal robust embeddings tailored to categorical data. These embeddings, created using a bilevel alternating minimization framework, can be transferred to boosted trees or random forests making them robust without the need for adversarial training while preserving their high accuracy on tabular data. We show that our methods outperform existing techniques within a practical threat model suitable for tabular data.

adversary, neural network, robustness, (16 more...)

arXiv.org Artificial Intelligence

2306.04064

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)
Health & Medicine (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.91)
(2 more...)

Add feedback

An Explainable Machine Learning Framework for the Accurate Diagnosis of Ovarian Cancer

Newaz, Asif, Taharat, Abdullah, Islam, Md Sakibul, Akanda, A. G. M. Fuad Hasan

arXiv.org Artificial IntelligenceDec-11-2023

Ovarian cancer (OC) is one of the most prevalent types of cancer in women. Early and accurate diagnosis is crucial for the survival of the patients. However, the majority of women are diagnosed in advanced stages due to the lack of effective biomarkers and accurate screening tools. While previous studies sought a common biomarker, our study suggests different biomarkers for the premenopausal and postmenopausal populations. This can provide a new perspective in the search for novel predictors for the effective diagnosis of OC. Lack of explainability is one major limitation of current AI systems. The stochastic nature of the ML algorithms raises concerns about the reliability of the system as it is difficult to interpret the reasons behind the decisions. To increase the trustworthiness and accountability of the diagnostic system as well as to provide transparency and explanations behind the predictions, explainable AI has been incorporated into the ML framework. SHAP is employed to quantify the contributions of the selected biomarkers and determine the most discriminative features. A hybrid decision support system has been established that can eliminate the bottlenecks caused by the black-box nature of the ML algorithms providing a safe and trustworthy AI tool. The diagnostic accuracy obtained from the proposed system outperforms the existing methods as well as the state-of-the-art ROMA algorithm by a substantial margin which signifies its potential to be an effective tool in the differential diagnosis of OC.

algorithm, cancer, prediction, (17 more...)

arXiv.org Artificial Intelligence

2312.08381

Country:

Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Ovarian Cancer (1.00)
Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(3 more...)

Add feedback

Anytime Approximate Formal Feature Attribution

Yu, Jinqiang, Farr, Graham, Ignatiev, Alexey, Stuckey, Peter J.

arXiv.org Artificial IntelligenceDec-11-2023

Widespread use of artificial intelligence (AI) algorithms and machine learning (ML) models on the one hand and a number of crucial issues pertaining to them warrant the need for explainable artificial intelligence (XAI). A key explainability question is: given this decision was made, what are the input features which contributed to the decision? Although a range of XAI approaches exist to tackle this problem, most of them have significant limitations. Heuristic XAI approaches suffer from the lack of quality guarantees, and often try to approximate Shapley values, which is not the same as explaining which features contribute to a decision. A recent alternative is so-called formal feature attribution (FFA), which defines feature importance as the fraction of formal abductive explanations (AXp's) containing the given feature. This measures feature importance from the view of formally reasoning about the model's behavior. It is challenging to compute FFA using its definition because that involves counting AXp's, although one can approximate it. Based on these results, this paper makes several contributions. First, it gives compelling evidence that computing FFA is intractable, even if the set of contrastive formal explanations (CXp's) is provided, by proving that the problem is #P-hard. Second, by using the duality between AXp's and CXp's, it proposes an efficient heuristic to switch from CXp enumeration to AXp enumeration on-the-fly resulting in an adaptive explanation enumeration algorithm effectively approximating FFA in an anytime fashion. Finally, experimental results obtained on a range of widely used datasets demonstrate the effectiveness of the proposed FFA approximation approach in terms of the error of FFA approximation as well as the number of explanations computed and their diversity given a fixed time limit.

axp, enumeration, explanation, (13 more...)

arXiv.org Artificial Intelligence

2312.06973

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.54)
(2 more...)

Add feedback