AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Context-aware Retail Product Recommendation with Regularized Gradient Boosting

Das, Sourya Dipta, Basak, Ayan

arXiv.org Artificial IntelligenceSep-17-2021

In the FARFETCH Fashion Recommendation challenge, the participants needed to predict the order in which various products would be shown to a user in a recommendation impression. The data was provided in two phases - a validation phase and a test phase. The validation phase had a labelled training set that contained a binary column indicating whether a product has been clicked or not. The dataset comprises over 5,000,000 recommendation events, 450,000 products and 230,000 unique users. It represents real, unbiased, but anonymised, interactions of actual users of the FARFETCH platform. The final evaluation was done according to the performance in the second phase. A total of 167 participants participated in the challenge, and we secured the 6th rank during the final evaluation with an MRR of 0.4658 on the test set. We have designed a unique context-aware system that takes the similarity of a product to the user context into account to rank products more effectively. Post evaluation, we have been able to fine-tune our approach with an MRR of 0.4784 on the test set, which would have placed us at the 3rd position.

impression, particular product, product price, (13 more...)

arXiv.org Artificial Intelligence

2109.08561

Country: North America > United States (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.68)

Add feedback

WildWood: a new Random Forest algorithm

Gaïffas, Stéphane, Merad, Ibrahim, Yu, Yiyang

arXiv.org Machine LearningSep-16-2021

We introduce WildWood (WW), a new ensemble algorithm for supervised learning of Random Forest (RF) type. While standard RF algorithms use bootstrap out-of-bag samples to compute out-of-bag scores, WW uses these samples to produce improved predictions given by an aggregation of the predictions of all possible subtrees of each fully grown tree in the forest. This is achieved by aggregation with exponential weights computed over out-of-bag samples, that are computed exactly and very efficiently thanks to an algorithm called context tree weighting. This improvement, combined with a histogram strategy to accelerate split finding, makes WW fast and competitive compared with other well-established ensemble methods, such as standard RF and extreme gradient boosting algorithms.

algorithm, node, prediction, (16 more...)

arXiv.org Machine Learning

2109.0801

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
North America > United States > Wisconsin (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Fake News Detection Using Machine Learning Ensemble Methods

#artificialintelligenceSep-14-2021, 03:10:13 GMT

The advent of the World Wide Web and the rapid adoption of social media platforms (such as Facebook and Twitter) paved the way for information dissemination that has never been witnessed in the human history before. With the current usage of social media platforms, consumers are creating and sharing more information than ever before, some of which are misleading with no relevance to reality. Automated classification of a text article as misinformation or disinformation is a challenging task. Even an expert in a particular domain has to explore multiple aspects before giving a verdict on the truthfulness of an article. In this work, we propose to use machine learning ensemble approach for automated classification of news articles. Our study explores different textual properties that can be used to distinguish fake contents from real. By using those properties, we train a combination of different machine learning algorithms using various ensemble methods and evaluate their performance on 4 real world datasets. Experimental evaluation confirms the superior performance of our proposed ensemble learner approach in comparison to individual learners. The advent of the World Wide Web and the rapid adoption of social media platforms (such as Facebook and Twitter) paved the way for information dissemination that has never been witnessed in the human history before. Besides other use cases, news outlets benefitted from the widespread use of social media platforms by providing updated news in near real time to its subscribers. The news media evolved from newspapers, tabloids, and magazines to a digital form such as online news platforms, blogs, social media feeds, and other digital media formats [1]. It became easier for consumers to acquire the latest news at their fingertips.

accuracy, algorithm, dataset, (13 more...)

#artificialintelligence

Genre: Research Report > New Finding (0.30)

Industry: Media > News (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.71)
(2 more...)

Add feedback

Generalized XGBoost Method

Guang, Yang

arXiv.org Machine LearningSep-14-2021

This method has achieved excellent predictive performance in many fields and has exhibited many advantages, and is consequently considered especially suitable for the statistical analysis of big data. However, this method is limited because its loss function must be convex. For many scenario-specific problems, such as non-life insurance pricing, the distribution of predictor variables is often heavytailed, so the optimal prediction performance may not be obtained by setting convex loss functions. Simultaneously, it is important to estimate the probability distribution of predictor variables. When the set parametric probability distribution contains more than two parameters, it may be necessary to model multiple parameters to obtain better prediction performance. Therefore, a more generalized regularized tree boosting method is required to make the loss function not limited to the convex function while modelling the tree boosting for multiple parameters, to adapt to the most common parametric probability distributions.

generalized xgboost method, loss function, xgboost method, (14 more...)

arXiv.org Machine Learning

2109.07473

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.50)

Industry: Banking & Finance > Insurance (0.91)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection

Adler, Afek Ilay, Painsky, Amichai

arXiv.org Machine LearningSep-12-2021

Gradient Boosting Machines (GBM) are among the go-to algorithms on tabular data, which produce state of the art results in many prediction tasks. Despite its popularity, the GBM framework suffers from a fundamental flaw in its base learners. Specifically, most implementations utilize decision trees that are typically biased towards categorical variables with large cardinalities. The effect of this bias was extensively studied over the years, mostly in terms of predictive performance. In this work, we extend the scope and study the effect of biased base learners on GBM feature importance (FI) measures. We show that although these implementation demonstrate highly competitive predictive performance, they still, surprisingly, suffer from bias in FI. By utilizing cross-validated (CV) unbiased base learners, we fix this flaw at a relatively low computational cost. We demonstrate the suggested framework in a variety of synthetic and real-world setups, showing a significant improvement in all GBM FI measures while maintaining relatively the same level of prediction accuracy.

categorical feature, fi measure, implementation, (14 more...)

arXiv.org Machine Learning

2109.05468

Country:

Oceania > New Zealand > North Island > Waikato > Hamilton (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.41)

Add feedback

Secondary control activation analysed and predicted with explainable AI

Kruse, Johannes, Schäfer, Benjamin, Witthaut, Dirk

arXiv.org Artificial IntelligenceSep-10-2021

The transition to a renewable energy system poses challenges for power grid operation and stability. Secondary control is key in restoring the power system to its reference following a disturbance. Underestimating the necessary control capacity may require emergency measures, such as load shedding. Hence, a solid understanding of the emerging risks and the driving factors of control is needed. In this contribution, we establish an explainable machine learning model for the activation of secondary control power in Germany. Training gradient boosted trees, we obtain an accurate description of control activation. Using SHapely Additive exPlanation (SHAP) values, we investigate the dependency between control activation and external features such as the generation mix, forecasting errors, and electricity market data. Thereby, our analysis reveals drivers that lead to high reserve requirements in the German power system. Our transparent approach, utilizing open data and making machine learning models interpretable, opens new scientific discovery avenues.

afrr, forecast error, prediction, (16 more...)

arXiv.org Artificial Intelligence

2109.04802

Country:

North America > United States > New York (0.04)
Europe > United Kingdom (0.04)
Europe > Norway (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry:

Energy > Power Industry > Utilities (0.47)
Energy > Renewable > Wind (0.47)
Energy > Renewable > Solar (0.47)

Technology:

Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.51)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.34)

Add feedback

Understanding Random Forests For Machine Learning

#artificialintelligenceSep-9-2021, 01:40:17 GMT

It has an important place in machine learning to solve regression and classification problems. It is useful for producing results with a machine learning algorithm without hypermeter tuning. So what does hypermeter tuning mean?

hypermeter, machine learning, random forest

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.40)

Add feedback

Automated Security Assessment for the Internet of Things

Duan, Xuanyu, Ge, Mengmeng, Le, Triet H. M., Ullah, Faheem, Gao, Shang, Lu, Xuequan, Babar, M. Ali

arXiv.org Artificial IntelligenceSep-9-2021

Internet of Things (IoT) based applications face an increasing number of potential security risks, which need to be systematically assessed and addressed. Expert-based manual assessment of IoT security is a predominant approach, which is usually inefficient. To address this problem, we propose an automated security assessment framework for IoT networks. Our framework first leverages machine learning and natural language processing to analyze vulnerability descriptions for predicting vulnerability metrics. The predicted metrics are then input into a two-layered graphical security model, which consists of an attack graph at the upper layer to present the network connectivity and an attack tree for each node in the network at the bottom layer to depict the vulnerability information. This security model automatically assesses the security of the IoT network by capturing potential attack paths. We evaluate the viability of our approach using a proof-of-concept smart building system model which contains a variety of real-world IoT devices and potential vulnerabilities. Our evaluation of the proposed framework demonstrates its effectiveness in terms of automatically predicting the vulnerability metrics of new vulnerabilities with more than 90% accuracy, on average, and identifying the most vulnerable attack paths within an IoT network. The produced assessment results can serve as a guideline for cybersecurity professionals to take further actions and mitigate risks in a timely manner.

assessment, attack path, vulnerability, (16 more...)

arXiv.org Artificial Intelligence

2109.04029

Country:

Oceania > Australia > Western Australia > Perth (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
Oceania > Australia > South Australia > Adelaide (0.04)

Genre: Research Report (0.82)

Industry:

Information Technology > Smart Houses & Appliances (1.00)
Information Technology > Security & Privacy (1.00)
Commercial Services & Supplies > Security & Alarm Services (1.00)
Government > Military > Cyberwarfare (0.34)

Technology:

Information Technology > Internet of Things (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Kaggle Competition -- Finding Donors for a Charity with an AUC of 0.94

#artificialintelligenceSep-7-2021, 07:31:13 GMT

Comparing Random Forest, Gradient Boosting, and XGBoost to select the best model to predict potential donors for a Charity. This project will employ 3 supervised algorithms, including Random Forest, Gradient Boosting, and XGBoost, to accurately model individuals' income using the 1994 U.S. Census data. I will then choose the best candidate algorithm from preliminary results and further optimize this algorithm to best model the data. My goal with this implementation is to construct a model that accurately predicts whether an individual makes more than 50,000 dollars. This sort of task can arise in a non-profit setting, where organizations survive on donations.

algorithm, charity, kaggle competition, (7 more...)

#artificialintelligence

Genre: Contests & Prizes (0.43)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.36)

Add feedback

Machine Learning -- Beginners Guide to Random Forest Classifiers (The Code)

#artificialintelligenceSep-4-2021, 12:28:34 GMT

So if you haven't already checked it out, I have posted about the mathematics behind this machine learning technique. If this is the first time you're coming across this algorithm I recommend you give it a read before jumping into the code. Otherwise, we're going to jump right into it!

beginner guide, machine learning, random forest classifier

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.40)

Add feedback