Goto

Collaborating Authors

 Ensemble Learning


Gradient Boosting Machine

#artificialintelligence

In this article, we are going to discuss an algorithm that works on boosting technique, The Gradient Boosting algorithm. It is more popularly known as Gradient boosting Machine or GBM. Note: If you are more interested in learning concepts in an Audio-Visual format, We have this entire article explained in the video below. If not, you may continue reading. The models in Gradient Boosting Machine are building sequentially and each of these subsequent models tries to reduce the error of the previous model.


Ensemble Learning: Bagging & Boosting

#artificialintelligence

The bias and variance tradeoff is one of the key concerns when working with machine learning algorithms. Fortunately there are some Ensemble Learning based techniques that machine learning practitioners can take advantage of in order to tackle the bias and variance tradeoff, these techniques are bagging and boosting. Bagging or Bootstrap Aggregation was formally introduced by Leo Breiman in 1996 [3]. Bagging is an Ensemble Learning technique which aims to reduce the error learning through the implementation of a set of homogeneous machine learning algorithms. The key idea of bagging is the use of multiple base learners which are trained separately with a random sample from the training set, which through a voting or averaging approach, produce a more stable and accurate model.


How I scored 100% accuracy using XGBoost to predict on Parkinson's Disease cases

#artificialintelligence

Parkin's Disease is a disease that is related to depression and even Alzhiemers. People who suffer from depression can also develop Parkinson's because both maladies are concerned with how the brain produces dopamine, which is a feel-good neurotransmitter. I have selected this project because it deals with a real world medical issue that can help doctors determine if a person has Parkinson's and how the disease is likely to progress. This DataFlair project can be found at the following link:- Python Machine Learning Project -- Detecting Parkinson's Disease with XGBoost -- DataFlair (data-flair.training) Colorado, who recorded the speech signals.


The Coolest Data Science Library I Found in 2021

#artificialintelligence

I became a data scientist because I like finding solutions for complex problems, the creative part of the job and the insights I gain from the data is what I enjoy the most. The boring stuff like cleaning data, preprocessing, and tuning hyperparameters brings me little joy, and that's why I try to automate these tasks as much as possible. If you also like automating the boring stuff you will love the library I am about to introduce in this article. As I mentioned in a previous article, the current state of the art in machine learning is dominated by deep learning in the case of perceptual problems and boosting methods for regression problems. Nobody is using the linear regression from Scikit-Learn to predict house prices in a Kaggle competition these days because the XGboost method is just more accurate.


The Success of AdaBoost and Its Application in Portfolio Management

arXiv.org Machine Learning

Equal-weighted portfolios are one of the most important strategies in portfolio management. They are portfolios with weights equally distributed across the selected securities in the long and/or short positions. In academic research, numerous studies have suggested that equal-weighted portfolios have a better out-of-sample performance than other portfolios (e.g., Jobson and Korkie 1981; James 2003; DeMiguel et al. 2007). Michaud (1989) and DeMiguel et al. (2007) argued that, the equal-weighted strategies do not suffer from the estimation error of the covariance matrix, which is vulnerable to outliers (Tu and Zhou, 2011). In industry, equal-weighted portfolios are popular across portfolio management in practice, particularly in the hedge funds. The MSCI has issued many equal-weighted indexes, which are "some of the oldest and best-known factor strategies that have aimed to identify specific characteristics of stocks generating excess return"


RAPIDS and Amazon SageMaker: Scale up and scale out to tackle ML challenges

#artificialintelligence

In this post, we combine the powers of NVIDIA RAPIDS and Amazon SageMaker to accelerate hyperparameter optimization (HPO). HPO runs many training jobs on your dataset using different settings to find the best-performing model configuration. HPO helps data scientists reach top performance, and is applied when models go into production, or to periodically refresh deployed models as new data arrives. However, HPO can feel out of reach on non-accelerated platforms as dataset sizes continue to grow. With RAPIDS and SageMaker working together, workloads like HPO are GPU scaled up (multi-GPU) within a node and cloud scaled out over parallel instances.


Hybrid stacked ensemble combined with genetic algorithms for Prediction of Diabetes

arXiv.org Artificial Intelligence

Diabetes is currently one of the most common, dangerous, and costly diseases in the world that is caused by an increase in blood sugar or a decrease in insulin in the body. Diabetes can have detrimental effects on people's health if diagnosed late. Today, diabetes has become one of the challenges for health and government officials. Prevention is a priority, and taking care of people's health without compromising their comfort is an essential need. In this study, the Ensemble training methodology based on genetic algorithms are used to accurately diagnose and predict the outcomes of diabetes mellitus. In this study, we use the experimental data, real data on Indian diabetics on the University of California website. Current developments in ICT, such as the Internet of Things, machine learning, and data mining, allow us to provide health strategies with more intelligent capabilities to accurately predict the outcomes of the disease in daily life and the hospital and prevent the progression of this disease and it's many complications. The results show the high performance of the proposed method in diagnosing the disease, which has reached 98.8%, and 99% accuracy in this study.


Prediction Models for AKI in ICU: A Comparative Study

#artificialintelligence

Purpose: To assess the performance of models for early prediction of acute kidney injury (AKI) in the Intensive Care Unit (ICU) setting. Patients and Methods: Data were collected from the Medical Information Mart for Intensive Care (MIMIC)-III database for all patients aged 18 years who had their serum creatinine (SCr) level measured for 72 h following ICU admission. Those with existing conditions of kidney disease upon ICU admission were excluded from our analyses. Seventeen predictor variables comprising patient demographics and physiological indicators were selected on the basis of the Kidney Disease Improving Global Outcomes (KDIGO) and medical literature. Six models from three types of methods were tested: Logistic Regression (LR), Support Vector Machines (SVM), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Decision Machine (LightGBM), and Convolutional Neural Network (CNN).


Interpretable Machines: Constructing Valid Prediction Intervals with Random Forests

arXiv.org Machine Learning

An important issue when using Machine Learning algorithms in recent research is the lack of interpretability. Although these algorithms provide accurate point predictions for various learning problems, uncertainty estimates connected with point predictions are rather sparse. A contribution to this gap for the Random Forest Regression Learner is presented here. Based on its Out-of-Bag procedure, several parametric and non-parametric prediction intervals are provided for Random Forest point predictions and theoretical guarantees for its correct coverage probability is delivered. In a second part, a thorough investigation through Monte-Carlo simulation is conducted evaluating the performance of the proposed methods from three aspects: (i) Analyzing the correct coverage rate of the proposed prediction intervals, (ii) Inspecting interval width and (iii) Verifying the competitiveness of the proposed intervals with existing methods. The simulation yields that the proposed prediction intervals are robust towards non-normal residual distributions and are competitive by providing correct coverage rates and comparably narrow interval lengths, even for comparably small samples.


Machine Learning with ML.NET - Random Forest

#artificialintelligence

One of the most popular ways to build ensembles is to use the same algorithm multiple times but on the different subsets of the training dataset. Techniques that are used for this are called bagging and pasting. The only difference in these techniques is that while building subsets bagging allows training instances to be sampled several times for the same predictor, while pasting is not allowing that. When all algorithms are trained, the ensemble makes a prediction by aggregating the predictions of all algorithms. In the classification case that is usually the hard-voting process, while for the regression average result is taken.