Goto

Collaborating Authors

 Ensemble Learning


Introduction to AdaBoost for Absolute Beginners - Analytics Vidhya

#artificialintelligence

This article was published as a part of the Data Science Blogathon. AdaBoost stands for Adaptive Boosting. It is a statistical classification algorithm. It is an algorithm that forms a committee of weak classifiers. It boosts the performance of machine learning algorithms.


Building a Random Forest Classifier to Predict Neural Spikes

#artificialintelligence

A step-by-step guide to building a Random Forest classifier in Python to predict subtypes of neural extracellular spikes using a real data-set recorded from Human brain organoids. Given the heterogeneity of neurons within the human brain itself, classification tools are commonly utilised to correlate electrical activity with different cell types and/or morphologies. This is a long-standing question in Neuroscience circles, and can be considerably variable between different species, pathologies, brain regions and layers. Fortunately, with the readily increasing computational power allowing improvements in machine-learning and deep-learning algorithms, Neuroscientists are provided with the tools to dive further into asking these important questions. However, as stated by Juavinett et al., for the most part programming skills are underrepresented in the community and new resources to teach them are crucial to solving the complexity of the human brain.


Geometry- and Accuracy-Preserving Random Forest Proximities

arXiv.org Machine Learning

Abstract--Random forests are considered one of the best out-of-the-box classification and regression algorithms due to their high level of predictive performance with relatively little tuning. Pairwise proximities can be computed from a trained random forest which measure the similarity between data points relative to the supervised task. Random forest proximities have been used in many applications including the identification of variable importance, data imputation, outlier detection, and data visualization. However, existing definitions of random forest proximities do not accurately reflect the data geometry learned by the random forest. In this paper, we introduce a novel definition of random forest proximities called Random Forest-Geometry-and Accuracy-Preserving proximities (RF-GAP). We prove that the proximity-weighted sum (regression) or majority vote (classification) using RF-GAP exactly match the out-of-bag random forest prediction, thus capturing the data geometry learned by the random forest. We empirically show that this improved geometric representation outperforms traditional random forest proximities in tasks such as data imputation and provides outlier detection and visualization results consistent with the learned data geometry. ANDOM forests [1] are well-known, powerful predictors comprised of an ensemble of binary recursive was first defined by Leo Breiman as the proportion of decision trees. Random forests are easily adapted for both trees in which the observations reside in the same terminal classification and regression, are trivially parallelizable, can node [16].


H2O.ai

#artificialintelligence

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensemble...


MeltpoolNet: Melt pool Characteristic Prediction in Metal Additive Manufacturing Using Machine Learning

arXiv.org Artificial Intelligence

Characterizing meltpool shape and geometry is essential in metal Additive Manufacturing (MAM) to control the printing process and avoid defects. Predicting meltpool flaws based on process parameters and powder material is difficult due to the complex nature of MAM process. Machine learning (ML) techniques can be useful in connecting process parameters to the type of flaws in the meltpool. In this work, we introduced a comprehensive framework for benchmarking ML for melt pool characterization. An extensive experimental dataset has been collected from more than 80 MAM articles containing MAM processing conditions, materials, meltpool dimensions, meltpool modes and flaw types. We introduced physics-aware MAM featurization, versatile ML models, and evaluation metrics to create a comprehensive learning framework for meltpool defect and geometry prediction. This benchmark can serve as a basis for melt pool control and process optimization. In addition, data-driven explicit models have been identified to estimate meltpool geometry from process parameters and material properties which outperform Rosenthal estimation for meltpool geometry while maintaining interpretability.


Model Generalization in Arrival Runway Occupancy Time Prediction by Feature Equivalences

arXiv.org Artificial Intelligence

General real-time runway occupancy time prediction modelling for multiple airports is a current research gap. An attempt to generalize a real-time prediction model for Arrival Runway Occupancy Time (AROT) is presented in this paper by substituting categorical features by their numerical equivalences. Three days of data, collected from Saab Sensis' Aerobahn system at three US airports, has been used for this work. Three tree-based machine learning algorithms: Decision Tree, Random Forest and Gradient Boosting are used to assess the generalizability of the model using numerical equivalent features. We have shown that the model trained on numerical equivalent features not only have performances at least on par with models trained on categorical features but also can make predictions on unseen data from other airports.



Machine Learning : Random Forest with Python from Scratch

#artificialintelligence

Are you ready to start your path to becoming a Machine Learning expert! Are you ready to train your machine like a father trains his son! A breakthrough in Machine Learning would be worth ten Microsofts." -Bill Gates There are lots of courses and lectures out there regarding random forest. After taking this course, the curtains of machine learning and especially random forest will be lifted for you. You'll be learning a state-of-the-art algorithm in details with practical implementation.


Fraud Detection with EvalML

#artificialintelligence

Data analytics has created a great impact in the banking and financial services industry, for example, by providing insights of global financial trends and financial modelling etc. Among them, fraud prevention and detection are one of the applications. This article applied predictive data analytics and supervised machine learning (ML) methods for card-not-present (CNP) fraud detection, and demonstrated modelling using EvalML, an auto machine learning library. This article also identified that both Decision Tree (DT) and XGBoost models work better than Linear models (LM), Random Forest (RF) and LightGBM models. The dataset used to demonstrate modelling is a large-scale dataset from Vesta which is available on Kaggle .


What's so special about CatBoost?

#artificialintelligence

CatBoost is based on gradient boosting. A new machine learning technique developed by Yandex outperforms many existing boosting algorithms like XGBoost, Light GBM. While deep learning algorithms require lots of data and computational power, boosting algorithms are still needed for most business problems. However, boosting algorithms like XGBoost takes hours to train, and sometimes you'll get frustrated…