Goto

Collaborating Authors

 bagging


A Quantum Bagging Algorithm with Unsupervised Base Learners for Label Corrupted Datasets

Rathi, Neeshu, Kumar, Sanjeev

arXiv.org Artificial Intelligence

The development of noise-resilient quantum machine learning (QML) algorithms is critical in the noisy intermediate-scale quantum (NISQ) era. In this work, we propose a quantum bagging framework that uses QMeans clustering as the base learner to reduce prediction variance and enhance robustness to label noise. Unlike bagging frameworks built on supervised learners, our method leverages the unsupervised nature of QMeans, combined with quantum bootstrapping via QRAM-based sampling and bagging aggregation through majority voting. Through extensive simulations on both noisy classification and regression tasks, we demonstrate that the proposed quantum bagging algorithm performs comparably to its classical counterpart using KMeans while exhibiting greater resilience to label corruption than supervised bagging methods. This highlights the potential of unsupervised quantum bagging in learning from unreliable data.



A Comprehensive Comparative Study of Individual ML Models and Ensemble Strategies for Network Intrusion Detection Systems

Bibers, Ismail, Arreche, Osvaldo, Abdallah, Mustafa

arXiv.org Artificial Intelligence

The escalating frequency of intrusions in networked systems has spurred the exploration of new research avenues in devising artificial intelligence (AI) techniques for intrusion detection systems (IDS). Various AI techniques have been used to automate network intrusion detection tasks, yet each model possesses distinct strengths and weaknesses. Selecting the optimal model for a given dataset can pose a challenge, necessitating the exploration of ensemble methods to enhance generalization and applicability in network intrusion detection. This paper addresses this gap by conducting a comprehensive evaluation of diverse individual models and both simple and advanced ensemble methods for network IDS. We introduce an ensemble learning framework tailored for assessing individual models and ensemble methods in network intrusion detection tasks. Our framework encompasses the loading of input datasets, training of individual models and ensemble methods, and the generation of evaluation metrics. Furthermore, we incorporate all features across individual models and ensemble techniques. The study presents results for our framework, encompassing 14 methods, including various bagging, stacking, blending, and boosting techniques applied to multiple base learners such as decision trees, neural networks, and among others. We evaluate the framework using two distinct network intrusion datasets, RoEduNet-SIMARGL2021 and CICIDS-2017, each possessing unique characteristics. Additionally, we categorize AI models based on their performances on our evaluation metrics and via their confusion matrices. Our assessment demonstrates the efficacy of learning across most setups explored in this study. Furthermore, we contribute to the community by releasing our source codes, providing a foundational ensemble learning framework for network intrusion detection.


Improving Online Bagging for Complex Imbalanced Data Stream

Przybyl, Bartosz, Stefanowski, Jerzy

arXiv.org Artificial Intelligence

Learning classifiers from imbalanced and concept drifting data streams is still a challenge. Most of the current proposals focus on taking into account changes in the global imbalance ratio only and ignore the local difficulty factors, such as the minority class decomposition into sub-concepts and the presence of unsafe types of examples (borderline or rare ones). As the above factors present in the stream may deteriorate the performance of popular online classifiers, we propose extensions of resampling online bagging, namely Neighbourhood Undersampling or Oversampling Online Bagging to take better account of the presence of unsafe minority examples. The performed computational experiments with synthetic complex imbalanced data streams have shown their advantage over earlier variants of online bagging resampling ensembles.


Autonomous Strike UAVs for Counterterrorism Missions: Challenges and Preliminary Solutions

Aljohani, Meshari, Mukkamalai, Ravi, Olariu, Stephen

arXiv.org Artificial Intelligence

Unmanned Aircraft Vehicles (UAVs) are becoming a crucial tool in modern warfare, primarily due to their cost-effectiveness, risk reduction, and ability to perform a wider range of activities. The use of autonomous UAVs to conduct strike missions against highly valuable targets is the focus of this research. Due to developments in ledger technology, smart contracts, and machine learning, such activities formerly carried out by professionals or remotely flown UAVs are now feasible. Our study provides the first in-depth analysis of challenges and preliminary solutions for successful implementation of an autonomous UAV mission. Specifically, we identify challenges that have to be overcome and propose possible technical solutions for the challenges identified. We also derive analytical expressions for the success probability of an autonomous UAV mission, and describe a machine learning model to train the UAV.


Prediction of the outcome of a Twenty-20 Cricket Match : A Machine Learning Approach

Shenoy, Ashish V, Singhvi, Arjun, Racha, Shruthi, Tunuguntla, Srinivas

arXiv.org Artificial Intelligence

Twenty20 cricket, sometimes written Twenty-20, and often abbreviated to T20, is a short form of cricket. In a Twenty20 game the two teams of 11 players have a single innings each, which is restricted to a maximum of 20 overs. This version of cricket is especially unpredictable and is one of the reasons it has gained popularity over recent times. However, in this paper we try four different machine learning approaches for predicting the results of T20 Cricket Matches. Specifically we take in to account: previous performance statistics of the players involved in the competing teams, ratings of players obtained from reputed cricket statistics websites, clustering the players' with similar performance statistics and propose a novel method using an ELO based approach to rate players. We compare the performances of each of these feature engineering approaches by using different ML algorithms, including logistic regression, support vector machines, bayes network, decision tree, random forest.


BagFlip: A Certified Defense against Data Poisoning

Zhang, Yuhao, Albarghouthi, Aws, D'Antoni, Loris

arXiv.org Artificial Intelligence

Machine learning models are vulnerable to data-poisoning attacks, in which an attacker maliciously modifies the training set to change the prediction of a learned model. In a trigger-less attack, the attacker can modify the training set but not the test inputs, while in a backdoor attack the attacker can also modify test inputs. Existing model-agnostic defense approaches either cannot handle backdoor attacks or do not provide effective certificates (i.e., a proof of a defense). We present BagFlip, a model-agnostic certified approach that can effectively defend against both trigger-less and backdoor attacks. We evaluate BagFlip on image classification and malware detection datasets. BagFlip is equal to or more effective than the state-of-the-art approaches for trigger-less attacks and more effective than the state-of-the-art approaches for backdoor attacks.


Diagnosis of Parkinson's Disease Based on Voice Signals Using SHAP and Hard Voting Ensemble Method

Ghaheri, Paria, Nasiri, Hamid, Shateri, Ahmadreza, Homafar, Arman

arXiv.org Artificial Intelligence

Background and Objective: Parkinson's disease (PD) is the second most common progressive neurological condition after Alzheimer's, characterized by motor and non-motor symptoms. Developing a method to diagnose the condition in its beginning phases is essential because of the significant number of individuals afflicting with this illness. PD is typically identified using motor symptoms or other Neuroimaging techniques, such as DATSCAN and SPECT. These methods are expensive, time-consuming, and unavailable to the general public; furthermore, they are not very accurate. These constraints encouraged us to develop a novel technique using SHAP and Hard Voting Ensemble Method based on voice signals. Methods: In this article, we used Pearson Correlation Coefficients to understand the relationship between input features and the output, and finally, input features with high correlation were selected. These selected features were classified by the Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), Gradient Boosting, and Bagging. Moreover, the Hard Voting Ensemble Method was determined based on the performance of the four classifiers. At the final stage, we proposed Shapley Additive exPlanations (SHAP) to rank the features according to their significance in diagnosing Parkinson's disease. Results and Conclusion: The proposed method achieved 85.42% accuracy, 84.94% F1-score, 86.77% precision, 87.62% specificity, and 83.20% sensitivity. The study's findings demonstrated that the proposed method outperformed state-of-the-art approaches and can assist physicians in diagnosing Parkinson's cases.


XGBoost in Oracle 20c

#artificialintelligence

Another of the new machine learning algorithms in Oracle 21c Database is called XGBoost. Most people will have come across this algorithm due to its recent popularity with winners of Kaggle competitions and other similar events. XGBoost is an open source software library providing a gradient boosting framework in most of the commonly used data science, machine learning and software development languages. It has it's origins back in 2014, but the first official academic publication on the algorithm was published in 2016 by Tianqi Chen and Carlos Guestrin, from the University of Washington. The algorithm builds upon the previous work on Decision Trees, Bagging, Random Forest, Boosting and Gradient Boosting.


Machine Learning Algorithms Explained in Less Than 1 Minute Each - KDnuggets

#artificialintelligence

This article will explain some of the most well known machine learning algorithms in less than a minute - helping everyone to understand them! One of the simplest Machine learning algorithms out there, Linear Regression is used to make predictions on continuous dependent variables with knowledge from independent variables. A dependent variable is the effect, in which its value depends on changes in the independent variable. You may remember the line of best fit from school - this is what Linear Regression produces. A simple example is predicting one's weight depending on their height.