AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Attention-based Random Forest and Contamination Model

Utkin, Lev V., Konstantinov, Andrei V.

arXiv.org Artificial IntelligenceJan-8-2022

A new approach called ABRF (the attention-based random forest) and its modifications for applying the attention mechanism to the random forest (RF) for regression and classification are proposed. The main idea behind the proposed ABRF models is to assign attention weights with trainable parameters to decision trees in a specific way. The weights depend on the distance between an instance, which falls into a corresponding leaf of a tree, and instances, which fall in the same leaf. This idea stems from representation of the Nadaraya-Watson kernel regression in the form of a RF. Three modifications of the general approach are proposed. The first one is based on applying the Huber's contamination model and on computing the attention weights by solving quadratic or linear optimization problems. The second and the third modifications use the gradient-based algorithms for computing trainable parameters. Numerical experiments with various regression and classification datasets illustrate the proposed method.

abrf-1, dataset, optimization problem, (16 more...)

arXiv.org Artificial Intelligence

2201.0288

Country:

Asia > Russia (0.14)
North America > United States > New York (0.04)
Europe > Russia > Northwestern Federal District > Leningrad Oblast > Saint Petersburg (0.04)

Genre: Research Report (0.82)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.91)
(2 more...)

Add feedback

A Fair and Efficient Hybrid Federated Learning Framework based on XGBoost for Distributed Power Prediction

Liu, Haizhou, Zhang, Xuan, Shen, Xinwei, Sun, Hongbin

arXiv.org Artificial IntelligenceJan-8-2022

In a modern power system, real-time data on power generation/consumption and its relevant features are stored in various distributed parties, including household meters, transformer stations and external organizations. To fully exploit the underlying patterns of these distributed data for accurate power prediction, federated learning is needed as a collaborative but privacy-preserving training scheme. However, current federated learning frameworks are polarized towards addressing either the horizontal or vertical separation of data, and tend to overlook the case where both are present. Furthermore, in mainstream horizontal federated learning frameworks, only artificial neural networks are employed to learn the data patterns, which are considered less accurate and interpretable compared to tree-based models on tabular datasets. To this end, we propose a hybrid federated learning framework based on XGBoost, for distributed power prediction from real-time external features. In addition to introducing boosted trees to improve accuracy and interpretability, we combine horizontal and vertical federated learning, to address the scenario where features are scattered in local heterogeneous parties and samples are scattered in various local districts. Moreover, we design a dynamic task allocation scheme such that each party gets a fair share of information, and the computing power of each party can be fully leveraged to boost training efficiency. A follow-up case study is presented to justify the necessity of adopting the proposed framework. The advantages of the proposed framework in fairness, efficiency and accuracy performance are also confirmed.

federated learning, learning, node, (16 more...)

arXiv.org Artificial Intelligence

2201.02783

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > Nepal (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.82)

Industry:

Information Technology (1.00)
Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)

Add feedback

Dynamic GPU Energy Optimization for Machine Learning Training Workloads

Wang, Farui, Zhang, Weizhe, Lai, Shichao, Hao, Meng, Wang, Zheng

arXiv.org Artificial IntelligenceJan-5-2022

GPUs are widely used to accelerate the training of machine learning workloads. As modern machine learning models become increasingly larger, they require a longer time to train, leading to higher GPU energy consumption. This paper presents GPOEO, an online GPU energy optimization framework for machine learning training workloads. GPOEO dynamically determines the optimal energy configuration by employing novel techniques for online measurement, multi-objective prediction modeling, and search optimization. To characterize the target workload behavior, GPOEO utilizes GPU performance counters. To reduce the performance counter profiling overhead, it uses an analytical model to detect the training iteration change and only collects performance counter data when an iteration shift is detected. GPOEO employs multi-objective models based on gradient boosting and a local search algorithm to find a trade-off between execution time and energy consumption. We evaluate the GPOEO by applying it to 71 machine learning workloads from two AI benchmark suites running on an NVIDIA RTX3080Ti GPU. Compared with the NVIDIA default scheduling strategy, GPOEO delivers a mean energy saving of 16.2% with a modest average execution time increase of 5.1%.

application, clock frequency, frequency, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TPDS.2021.3137867

2201.01684

Country:

Asia > China > Heilongjiang Province > Harbin (0.04)
Europe > United Kingdom > England > West Yorkshire > Leeds (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.84)

Industry: Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
(2 more...)

Add feedback

Four interpretable algorithms that you should use in 2022

#artificialintelligenceJan-4-2022, 16:25:42 GMT

The new year has begun, and it is the time for good resolutions. One of them could be to make decision-making processes more interpretable. To help you do this, I present four interpretable rule-based algorithms. These four algorithms share the use of ensemble of decision trees as rule generator (like Random Forest, AdaBoost, Gradient Boosting, etc.). In other words, each of these interpretable algorithms starts its process by fitting a black box model and generating an interpretable rule ensemble model.

interpretability, interpretable algorithm

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.60)

Add feedback

Introduction to Binary Classification with PyCaret - KDnuggets

#artificialintelligenceDec-30-2021, 18:31:52 GMT

PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that speeds up the experiment cycle exponentially and makes you more productive. In comparison with the other open-source machine learning libraries, PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few lines only. This makes experiments exponentially fast and efficient. PyCaret is essentially a Python wrapper around several machine learning libraries and frameworks such as scikit-learn, XGBoost, LightGBM, CatBoost, spaCy, Optuna, Hyperopt, Ray, and a few more.

dataset, model function, pycaret, (17 more...)

#artificialintelligence

Country:

North America > United States > California > Orange County > Irvine (0.04)
Asia > Taiwan (0.04)

Genre: Workflow (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.34)

Add feedback

The SAMME.C2 algorithm for severely imbalanced multi-class classification

So, Banghee, Valdez, Emiliano A.

arXiv.org Machine LearningDec-29-2021

Classification predictive modeling involves the accurate assignment of observations in a dataset to target classes or categories. There is an increasing growth of real-world classification problems with severely imbalanced class distributions. In this case, minority classes have much fewer observations to learn from than those from majority classes. Despite this sparsity, a minority class is often considered the more interesting class yet developing a scientific learning algorithm suitable for the observations presents countless challenges. In this article, we suggest a novel multi-class classification algorithm specialized to handle severely imbalanced classes based on the method we refer to as SAMME.C2. It blends the flexible mechanics of the boosting techniques from SAMME algorithm, a multi-class classifier, and Ada.C2 algorithm, a cost-sensitive binary classifier designed to address highly class imbalances. Not only do we provide the resulting algorithm but we also establish scientific and statistical formulation of our proposed SAMME.C2 algorithm. Through numerical experiments examining various degrees of classifier difficulty, we demonstrate consistent superior performance of our proposed model.

algorithm, classifier, samme, (16 more...)

arXiv.org Machine Learning

2112.14868

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York (0.04)
North America > United States > Michigan (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.68)

Add feedback

[100%OFF] Machine Learning & Deep Learning in Python & R

#artificialintelligenceDec-26-2021, 05:45:43 GMT

Learn how to solve real life problem using the Machine learning techniques Machine Learning models such as Linear Regression, Logistic Regression, KNN etc. Advanced Machine Learning models such as Decision trees, XGBoost, Random Forest, SVM etc. Understanding of basics of statistics and concepts of Machine Learning How to do basic statistical operations and run ML models in Python Indepth knowledge of data collection and data preprocessing for Machine Learning problem How to convert business problem into a Machine learning problem Can I get a certificate after completing the course? Are there any other coupons available for this course? Note: 100% OFF Udemy coupon codes are valid for maximum 3 days only. Look for "ENROLL NOW" button at the end of the post. Disclosure: This post may contain affiliate links and we may get small commission if you make a purchase.

learning, machine learning, machine learning & deep learning, (13 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry: Education > Focused Education > Special Education (0.58)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.44)

Add feedback

House Price Prediction using a Random Forest Classifier

#artificialintelligenceDec-21-2021, 13:57:07 GMT

In this blog post, I will use machine learning and Python for predicting house prices. I will use a Random Forest Classifier (in fact Random Forest regression). In the end, I will demonstrate my Random Forest Python algorithm! There is no law except the law that there is no law. Data Science is about discovering hidden patterns (laws) in your data.

house price prediction, random forest classifier, random forest regression, (10 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Explanation of Machine Learning Models Using Shapley Additive Explanation and Application for Real Data in Hospital

Nohara, Yasunobu, Matsumoto, Koutarou, Soejima, Hidehisa, Nakashima, Naoki

arXiv.org Machine LearningDec-21-2021

When using machine learning techniques in decision-making processes, the interpretability of the models is important. In the present paper, we adopted the Shapley additive explanation (SHAP), which is based on fair profit allocation among many stakeholders depending on their contribution, for interpreting a gradient-boosting decision tree model using hospital data. For better interpretability, we propose two novel techniques as follows: (1) a new metric of feature importance using SHAP and (2) a technique termed feature packing, which packs multiple similar features into one grouped feature to allow an easier understanding of the model without reconstruction of the model. We then compared the explanation results between the SHAP framework and existing methods. In addition, we showed how the A/G ratio works as an important prognostic factor for cerebral infarction using our hospital data and proposed techniques.

dependence plot, feature importance, shap dependence plot, (15 more...)

arXiv.org Machine Learning

doi: 10.1016/j.cmpb.2021.106584

2112.11071

Country:

Asia > Japan > Kyūshū & Okinawa > Kyūshū > Kumamoto Prefecture > Kumamoto (0.05)
Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Hematology (0.89)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.89)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.67)

Add feedback

Parallel XGBoost with Dask in Python

#artificialintelligenceDec-20-2021, 16:50:41 GMT

Out of the box, XGBoost cannot be trained on datasets larger than your computer memory; Python will throw a MemoryError. This tutorial will show you how to go beyond your local machine limitations by leveraging distributed XGBoost with Dask with only minor changes to your existing code. Here is the code we will use if you want to jump right in. By default, XGBoost trains models sequentially. This is fine for basic projects, but as the size of your dataset and/or ML model grows, you may want to consider running XGBoost in distributed mode with Dask to speed up computations and reduce the burden on your local machine.

dask, dataset, xgboost, (12 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback