Approaching (Almost) Any Machine Learning Problem

#artificialintelligence

Some say over 60-70% time is spent in data cleaning, munging and bringing data to a suitable format such that machine learning models can be applied on that data. This post focuses on the second part, i.e., applying machine learning models, including the preprocessing steps. The pipelines discussed in this post come as a result of over a hundred machine learning competitions that I've taken part in. It must be noted that the discussion here is very general but very useful and there can also be very complicated methods which exist and are practised by professionals. Before applying the machine learning models, the data must be converted to a tabular form.


How to Develop Your First XGBoost Model in Python with scikit-learn - Machine Learning Mastery

#artificialintelligence

XGBoost is an implementation of gradient boosted decision trees designed for speed and performance that is dominative competitive machine learning. In this post you will discover how you can install and create your first XGBoost model in Python. How to Develop Your First XGBoost Model in Python with scikit-learn Photo by Justin Henry, some rights reserved. XGBoost is the high performance implementation of gradient boosting that you can now access directly in Python. Assuming you have a working SciPy environment, XGBoost can be installed easily using pip.


How to Tune the Number and Size of Decision Trees with XGBoost in Python - Machine Learning Mastery

#artificialintelligence

Gradient boosting involves the creation and addition of decision trees sequentially, each attempting to correct the mistakes of the learners that came before it. This raises the question as to how many trees (weak learners or estimators) to configure in your gradient boosting model and how big each tree should be. In this post you will discover how to design a systematic experiment to select the number and size of decision trees to use on your problem. How to Tune the Number and Size of Decision Trees with XGBoost in Python Photo by USFWSmidwest, some rights reserved. XGBoost is the high performance implementation of gradient boosting that you can now access directly in Python.


Imbalance-XGBoost: Leveraging Weighted and Focal Losses for Binary Label-Imbalanced Classification with XGBoost

arXiv.org Machine Learning

The paper presents Imbalance-XGBoost, a Python package that combines the powerful XGBoost software with weighted and focal losses to tackle binary label-imbalanced classification tasks. Though a small-scale program in terms of size, the package is, to the best of the authors' knowledge, the first of its kind which provides an integrated implementation for the two losses on XGBoost and brings a general-purpose extension on XGBoost for label-imbalanced scenarios. In this paper, the design and usage of the package are described with exemplar code listings, and its convenience to be integrated into Python-driven Machine Learning projects is illustrated. Furthermore, as the first- and second-order derivatives of the loss functions are essential for the implementations, the algebraic derivation is discussed and it can be deemed as a separate algorithmic contribution. The performances of the algorithms implemented in the package are empirically evaluated on Parkinson's disease classification data set, and multiple state-of-the-art performances have been observed. Given the scalable nature of XGBoost, the package has great potentials to be applied to real-life binary classification tasks, which are usually of large-scale and label-imbalanced.


Approaching (Almost) Any Machine Learning Problem

#artificialintelligence

Some say over 60-70% time is spent in data cleaning, munging and bringing data to a suitable format such that machine learning models can be applied on that data. This post focuses on the second part, i.e., applying machine learning models, including the preprocessing steps. The pipelines discussed in this post come as a result of over a hundred machine learning competitions that I've taken part in. It must be noted that the discussion here is very general but very useful and there can also be very complicated methods which exist and are practised by professionals. Before applying the machine learning models, the data must be converted to a tabular form.