For the python code, I used the Iris dataset which is available within the Scikit-learn package. It is a very small dataset (150 rows only) with a multiclass classification problem. As we are mostly focussing on hyperparameter tuning, I have not performed the EDA(exploratory data analysis) or feature engineering part and directly jumped into the model-building. I used the XGBoostClssifier algorithm for the model-building to classify the target variables. Then, we pass predefined values for hyperparameters to the GridSearchCV function.
Automated Machine Learning (AutoML) refers to techniques for automatically discovering well-performing models for predictive modeling tasks with very little user involvement. HyperOpt is an open-source library for large scale AutoML and HyperOpt-Sklearn is a wrapper for HyperOpt that supports AutoML with HyperOpt for the popular Scikit-Learn machine learning library, including the suite of data preparation transforms and classification and regression algorithms. In this tutorial, you will discover how to use HyperOpt for automatic machine learning with Scikit-Learn in Python. HyperOpt for Automated Machine Learning With Scikit-Learn Photo by Neil Williamson, some rights reserved. HyperOpt is an open-source Python library for Bayesian optimization developed by James Bergstra.
Most Professional Machine Learning practitioners follow the ML Pipeline as a standard, to keep their work efficient and to keep the flow of work. A pipeline is created to allow data flow from its raw format to some useful information. All sub-fields in this pipeline's modules are equally important for us to produce quality results, and one of them is Hyper-Parameter Tuning. Most of us know the best way to proceed with Hyper-Parameter Tuning is to use the GridSearchCV or RandomSearchCV from the sklearn module. But apart from these algorithms, there are many other Advanced methods for Hyper-Parameter Tuning.
When working on a machine learning project, you need to follow a series of steps until you reach your goal. One of the steps you have to perform is hyperparameter optimization on your selected model. This task always comes after the model selection process where you choose the model that is performing better than other models. Before I define hyperparameter optimization, you need to understand what a hyperparameter is. In short, hyperparameters are different parameter values that are used to control the learning process and have a significant effect on the performance of machine learning models. These parameters are tunable and can directly affect how well a model trains. So then hyperparameter optimization is the process of finding the right combination of hyperparameter values to achieve maximum performance on the data in a reasonable amount of time.
Tuning hyperparameters for machine learning algorithms is a tedious task, one that is typically done manually. To enable automated hyperparameter tuning, recent works have started to use techniques based on Bayesian optimization. However, to practically enable automated tuning for large scale machine learning training pipelines, significant gaps remain in existing libraries, including lack of abstractions, fault tolerance, and flexibility to support scheduling on any distributed computing framework. To address these challenges, we present Mango, a Python library for parallel hyperparameter tuning. Mango enables the use of any distributed scheduling framework, implements intelligent parallel search strategies, and provides rich abstractions for defining complex hyperparameter search spaces that are compatible with scikit-learn. Mango is comparable in performance to Hyperopt, another widely used library. Mango is available open-source and is currently used in production at Arm Research to provide state-of-art hyperparameter tuning capabilities.