As we already know, machine learning is a way of automating complex problem-solving. But can machine learning itself be automated? That's what we'll explore in this article. By its end, we'll have answered that question and shown practical ways it can be accomplished. When applying machine learning models, we'd usually do data pre-processing, feature engineering, feature extraction and, feature selection.
Integrating ML models in software is of growing interest. Building accurate models requires right choice of hyperparameters for training procedures (learners), when the training dataset is given. AutoML tools provide APIs to automate the choice, which usually involve many trials of different hyperparameters for a given training dataset. Since training and evaluation of complex models can be time and resource consuming, existing AutoML solutions require long time or large resource to produce accurate models for large scale training data. That prevents AutoML to be embedded in a software which needs to repeatedly tune hyperparameters and produce models to be consumed by other components, such as large-scale data systems. We present a fast and lightweight hyperparameter optimization method FLO and use it to build an efficient AutoML solution. Our method optimizes for minimal evaluation cost instead of number of iterations to find accurate models. Our main idea is to leverage a holistic consideration of the relations among model complexity, evaluation cost and accuracy. FLO has a strong anytime performance and significantly outperforms Bayesian Optimization and random search for hyperparameter tuning on a large open source AutoML Benchmark. Our AutoML solution also outperforms top-ranked AutoML libraries in a majority of the tasks on this benchmark.
With the use of recent methods like Bayesian Optimization, the library is built to navigate the space of possible models and learns to infer if a specific configuration will work well on a given task. Created by Matthias Feurer, et al., the library's technical details are described in a paper, Efficient and Robust Machine Learning. In addition to discovering data preparation and model selections for a dataset, it learns from models that perform well on similar datasets. Top-performing models are aggregated in an ensemble. On top of an efficient implementation, auto-sklearn requires minimal user interaction.
GitHub is a clearinghouse for all sorts of open source projects, including those for machine learning, automated and otherwise. More specifically, automated machine learning is the use of automated techniques, be they learned methods or simple heuristics, used for algorithm selection, hyperparameter tuning, architecture design, or any other conceivable portion of a machine learning implementation. Switching gears, Indiana Jones is one of the greatest characters to ever grace the silver screen. Raiders of the Lost Ark, the first movie in which the character was featured, is a personal favorite, film adored by millions. The rest of the (current) quadrilogy movies run alternately hot and cold, but even the poorest quality Indiana Jones is better than 95% of available cinema.
We introduce AutoGluon-Tabular, an open-source AutoML framework that requires only a single line of Python to train highly accurate machine learning models on an unprocessed tabular dataset such as a CSV file. Unlike existing AutoML frameworks that primarily focus on model/hyperparameter selection, AutoGluon-Tabular succeeds by ensembling multiple models and stacking them in multiple layers. Experiments reveal that our multi-layer combination of many models offers better use of allocated training time than seeking out the best. A second contribution is an extensive evaluation of public and commercial AutoML platforms including TPOT, H2O, AutoWEKA, auto-sklearn, AutoGluon, and Google AutoML Tables. Tests on a suite of 50 classification and regression tasks from Kaggle and the OpenML AutoML Benchmark reveal that AutoGluon is faster, more robust, and much more accurate. We find that AutoGluon often even outperforms the best-in-hindsight combination of all of its competitors. In two popular Kaggle competitions, AutoGluon beat 99% of the participating data scientists after merely 4h of training on the raw data.