AITopics | Ensemble Learning

XGBOOST -- IN A NUTSHELL

#artificialintelligenceDec-17-2021, 00:30:33 GMT

XGBoost stands for "Extreme Gradient Boosting". It is a decision tree-based algorithm which is used in Machine Learning. XGBoost makes use of the gradient boosting framework. Decision Trees are structures that consists of a set of leaf nodes, branches as well as internal nodes. Each leaf node represents a Class Label. The internal node represents the attributes while the branches connect the leaves to these internal nodes.

algorithm, prediction, weak model, (12 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

An overview of active learning methods for insurance with fairness appreciation

Elie, Romuald, Hillairet, Caroline, Hu, François, Juillard, Marc

arXiv.org Machine LearningDec-17-2021

This paper addresses and solves some challenges in the adoption of machine learning in insurance with the democratization of model deployment. The first challenge is reducing the labelling effort (hence focusing on the data quality) with the help of active learning, a feedback loop between the model inference and an oracle: as in insurance the unlabeled data is usually abundant, active learning can become a significant asset in reducing the labelling cost. For that purpose, this paper sketches out various classical active learning methodologies before studying their empirical impact on both synthetic and real datasets. Another key challenge in insurance is the fairness issue in model inferences. We will introduce and integrate a post-processing fairness for multi-class tasks in this active learning framework to solve these two issues. Finally numerical experiments on unfair datasets highlight that the proposed setup presents a good compromise between model precision and fairness.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

2112.09466

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.93)
Banking & Finance > Insurance (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.93)
(3 more...)

Add feedback

Identification of Twitter Bots Based on an Explainable Machine Learning Framework: The US 2020 Elections Case Study

Shevtsov, Alexander, Tzagkarakis, Christos, Antonakaki, Despoina, Ioannidis, Sotiris

arXiv.org Artificial IntelligenceDec-14-2021

Twitter is one of the most popular social networks attracting millions of users, while a considerable proportion of online discourse is captured. It provides a simple usage framework with short messages and an efficient application programming interface (API) enabling the research community to study and analyze several aspects of this social network. However, the Twitter usage simplicity can lead to malicious handling by various bots. The malicious handling phenomenon expands in online discourse, especially during the electoral periods, where except the legitimate bots used for dissemination and communication purposes, the goal is to manipulate the public opinion and the electorate towards a certain direction, specific ideology, or political party. This paper focuses on the design of a novel system for identifying Twitter bots based on labeled Twitter data. To this end, a supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm, where the hyper-parameters are tuned via cross-validation. Our study also deploys Shapley Additive Explanations (SHAP) for explaining the ML model predictions by calculating feature importance, using the game theoretic-based Shapley values. Experimental evaluation on distinct Twitter datasets demonstrate the superiority of our approach, in terms of bot detection accuracy, when compared against a recent state-of-the-art Twitter bot detection method.

artificial intelligence, machine learning, social media, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1609/icwsm.v16i1.19349

2112.04913

Country:

Europe > Russia (0.04)
Asia > Russia (0.04)
North America > United States > New York > New York County > New York City (0.04)
(6 more...)

Genre: Research Report (0.64)

Industry:

Information Technology > Services (1.00)
Information Technology > Security & Privacy (1.00)
Government > Voting & Elections (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Machine Learning-based Prediction of Porosity for Concrete Containing Supplementary Cementitious Materials

Cao, Chong

arXiv.org Artificial IntelligenceDec-13-2021

Porosity has been identified as the key indicator of the durability properties of concrete exposed to aggressive environments. This paper applies ensemble learning to predict porosity of high-performance concrete containing supplementary cementitious materials. The concrete samples utilized in this study are characterized by eight composition features including w/b ratio, binder content, fly ash, GGBS, superplasticizer, coarse/fine aggregate ratio, curing condition and curing days. The assembled database consists of 240 data records, featuring 74 unique concrete mixture designs. The proposed machine learning algorithms are trained on 180 observations (75%) chosen randomly from the data set and then tested on the remaining 60 observations (25%). The numerical experiments suggest that the regression tree ensembles can accurately predict the porosity of concrete from its mixture compositions. Gradient boosting trees generally outperforms random forests in terms of prediction accuracy. For random forests, the out-of-bag error based hyperparameter tuning strategy is found to be much more efficient than k-Fold Cross-Validation.

artificial intelligence, fly ash, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.rineng.2022.100794

2112.07353

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.48)

Industry:

Materials > Construction Materials (1.00)
Energy > Oil & Gas > Upstream (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Artificial Intelligence and Design of Experiments for Assessing Security of Electricity Supply: A Review and Strategic Outlook

Priesmann, Jan, Münch, Justin, Ridha, Elias, Spiegel, Thomas, Reich, Marius, Adam, Mario, Nolting, Lars, Praktiknjo, Aaron

arXiv.org Artificial IntelligenceDec-13-2021

Assessing the effects of the energy transition and liberalization of energy markets on resource adequacy is an increasingly important and demanding task. The rising complexity in energy systems requires adequate methods for energy system modeling leading to increased computational requirements. Furthermore, with complexity, uncertainty increases likewise calling for probabilistic assessments and scenario analyses. To adequately and efficiently address these various requirements, new methods from the field of data science are needed to accelerate current methods. With our systematic literature review, we want to close the gap between the three disciplines (1) assessment of security of electricity supply, (2) artificial intelligence, and (3) design of experiments. For this, we conduct a large-scale quantitative review on selected fields of application and methods and make a synthesis that relates the different disciplines to each other. Among other findings, we identify metamodeling of complex security of electricity supply models using AI methods and applications of AI-based methods for forecasts of storage dispatch and (non-)availabilities as promising fields of application that have not sufficiently been covered, yet. We end with deriving a new methodological pipeline for adequately and efficiently addressing the present and upcoming challenges in the assessment of security of electricity supply.

bayesian model, ensemble method, neural network, (16 more...)

arXiv.org Artificial Intelligence

2112.04889

Country:

North America > United States > New Jersey > Middlesex County > Piscataway (0.14)
North America > United States > New York > New York County > New York City (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
(33 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.67)

Industry:

Energy > Renewable > Solar (1.00)
Energy > Power Industry > Utilities (1.00)
Energy > Renewable > Wind (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(7 more...)

Add feedback

Confidence intervals for the random forest generalization error

F., Marques, C, Paulo

arXiv.org Machine LearningDec-11-2021

How confident can we be in the generalization capacity of a predictive model? Of the many devices discussed in the statistical learning literature [1, 2, 3], a simple random split of the original data into training and test sets, and methods of folded cross-validation, stand out as the most common tools used to tackle the generalization issue. Availability of point estimates for the generalization error given by these procedures naturally raises the question of how to quantify the uncertainty involved in these estimates spending a manageable computational cost. Random forests [4] elegantly provide an alternative low cost (almost free) point estimate of the generalization error without requiring splittings of the data, and avoiding the computational burden of retraining the predictive model several times. The bagging mechanism [5] used to construct the ensemble of trees implies that each training data point is not used (stays "out-of-bag") when growing approximately 36.8% of the trees in the forest. This property gives us the so called out-of-bag estimate of the random forest generalization error: for each observation, using a suitable loss function, we compute the predictive error made by the random subforest whose trees didn't include the observation under consideration in its training process; the out-of-bag estimate is the average of these prediction errors over the whole training sample. 1

confidence interval, generalization error, training sample, (13 more...)

arXiv.org Machine Learning

2112.06101

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Iowa > Story County > Ames (0.05)
South America > Brazil > São Paulo (0.04)

Genre: Research Report (0.50)

Industry: Telecommunications (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.87)

Add feedback

Efficient Batch Homomorphic Encryption for Vertically Federated XGBoost

#artificialintelligenceDec-9-2021, 01:16:54 GMT

More and more orgainizations and institutions make efforts on using external data to improve the performance of AI services. To address the data privacy and security concerns, federated learning has attracted increasing attention from both academia and industry to securely construct AI models across multiple isolated data providers. In this paper, we studied the efficiency problem of adapting widely used XGBoost model in real-world applications to vertical federated learning setting. State-of-the-art vertical federated XGBoost frameworks requires large number of encryption operations and ciphertext transmissions, which makes the model training much less efficient than training XGBoost models locally. To bridge this gap, we proposed a novel batch homomorphic encryption method to cut the cost of encryption-related computation and transmission in nearly half. This is achieved by encoding the first-order derivative and the second-order derivative into a single number for encryption, ciphertext transmission, and homomorphic addition operations. The sum of multiple first-order derivatives and second-order derivatives can be simultaneously decoded from the sum of encoded values. We are motivated by the batch idea in the work of BatchCrypt for horizontal federated learning, and design a novel batch method to address the limitations of allowing quite few number of negative numbers. The encode procedure of the proposed batch method consists of four steps, including shifting, truncating, quantizing and batching, while the decoding procedure consists of de-quantization and shifting back. The advantages of our method are demonstrated through theoretical analysis and extensive numerical experiments.

efficient batch homomorphic encryption, federated xgboost

#artificialintelligence

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

A Comparative Analysis of Machine Learning Techniques for IoT Intrusion Detection

Vitorino, João, Andrade, Rui, Praça, Isabel, Sousa, Orlando, Maia, Eva

arXiv.org Artificial IntelligenceDec-8-2021

The digital transformation faces tremendous security challenges. In particular, the growing number of cyber-attacks targeting Internet of Things (IoT) systems restates the need for a reliable detection of malicious network activity. This paper presents a comparative analysis of supervised, unsupervised and reinforcement learning techniques on nine malware captures of the IoT-23 dataset, considering both binary and multi-class classification scenarios. The developed models consisted of Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), Isolation Forest (iForest), Local Outlier Factor (LOF) and a Deep Reinforcement Learning (DRL) model based on a Double Deep Q-Network (DDQN), adapted to the intrusion detection context. The most reliable performance was achieved by LightGBM. Nonetheless, iForest displayed good anomaly detection results and the DRL model demonstrated the possible benefits of employing this methodology to continuously improve the detection. Overall, the obtained results indicate that the analyzed techniques are well suited for IoT intrusion detection.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2111.13149

Country:

North America > United States (0.04)
Europe > Portugal > Porto > Porto (0.04)

Genre:

Overview (0.68)
Research Report (0.50)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
(2 more...)

Add feedback

Efficient Batch Homomorphic Encryption for Vertically Federated XGBoost

Xu, Wuxing, Fan, Hao, Li, Kaixin, Yang, Kai

arXiv.org Artificial IntelligenceDec-8-2021

More and more orgainizations and institutions make efforts on using external data to improve the performance of AI services. To address the data privacy and security concerns, federated learning has attracted increasing attention from both academia and industry to securely construct AI models across multiple isolated data providers. In this paper, we studied the efficiency problem of adapting widely used XGBoost model in real-world applications to vertical federated learning setting. State-of-the-art vertical federated XGBoost frameworks requires large number of encryption operations and ciphertext transmissions, which makes the model training much less efficient than training XGBoost models locally. To bridge this gap, we proposed a novel batch homomorphic encryption method to cut the cost of encryption-related computation and transmission in nearly half. This is achieved by encoding the first-order derivative and the second-order derivative into a single number for encryption, ciphertext transmission, and homomorphic addition operations. The sum of multiple first-order derivatives and second-order derivatives can be simultaneously decoded from the sum of encoded values. We are motivated by the batch idea in the work of BatchCrypt for horizontal federated learning, and design a novel batch method to address the limitations of allowing quite few number of negative numbers. The encode procedure of the proposed batch method consists of four steps, including shifting, truncating, quantizing and batching, while the decoding procedure consists of de-quantization and shifting back. The advantages of our method are demonstrated through theoretical analysis and extensive numerical experiments.

active party, federated learning, transmission, (14 more...)

arXiv.org Artificial Intelligence

2112.04261

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Regression in Python using Sklearn, XGBoost and PySpark

#artificialintelligenceDec-7-2021, 01:55:28 GMT

In the above story, we have used a Fitbit dataset. Based on the EDA, it was found that steps taken and calories are somewhat linearly correlated and together they may be indicative of a lower risk for all-cause mortality. More interestingly, among our data there is one dataset which has not been used yet which is a weight and BMI log. These data have a distinct nature since they are not necessarily machine generated, thereafter they serve the purpose of being'labels'. In simple words, users are collecting data regarding their activity using their Fitbit, and once in a while, they log some body information such as weight, fat and BMI. This creates an optimal scenario for a supervised learning problem, where, for example, we could use the Fitbit activity data to predict the BMI of a user.

regression, supervised learning problem, xgboost and pyspark, (8 more...)

#artificialintelligence

Industry: Education > Focused Education > Special Education (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.53)

Add feedback

Filters

Collaborating Authors

Ensemble Learning

XGBOOST -- IN A NUTSHELL

An overview of active learning methods for insurance with fairness appreciation

Identification of Twitter Bots Based on an Explainable Machine Learning Framework: The US 2020 Elections Case Study

Machine Learning-based Prediction of Porosity for Concrete Containing Supplementary Cementitious Materials

Artificial Intelligence and Design of Experiments for Assessing Security of Electricity Supply: A Review and Strategic Outlook

Confidence intervals for the random forest generalization error

Efficient Batch Homomorphic Encryption for Vertically Federated XGBoost

A Comparative Analysis of Machine Learning Techniques for IoT Intrusion Detection

Efficient Batch Homomorphic Encryption for Vertically Federated XGBoost

Regression in Python using Sklearn, XGBoost and PySpark