Goto

Collaborating Authors

Automated Machine Learning Techniques for Data Streams

arXiv.org Artificial Intelligence

Automated machine learning techniques benefited from tremendous research progress in recently. These developments and the continuous-growing demand for machine learning experts led to the development of numerous AutoML tools. However, these tools assume that the entire training dataset is available upfront and that the underlying distribution does not change over time. These assumptions do not hold in a data stream mining setting where an unbounded stream of data cannot be stored and is likely to manifest concept drift. Industry applications of machine learning on streaming data become more popular due to the increasing adoption of real-time streaming patterns in IoT, microservices architectures, web analytics, and other fields. The research summarized in this paper surveys the state-of-the-art open-source AutoML tools, applies them to data collected from streams, and measures how their performance changes over time. For comparative purposes, batch, batch incremental and instance incremental estimators are applied and compared. Moreover, a meta-learning technique for online algorithm selection based on meta-feature extraction is proposed and compared while model replacement and continual AutoML techniques are discussed. The results show that off-the-shelf AutoML tools can provide satisfactory results but in the presence of concept drift, detection or adaptation techniques have to be applied to maintain the predictive accuracy over time.


State of the Art in Automated Machine Learning

#artificialintelligence

In recent years, machine learning has been very successful in solving a wide range of problems. In particular, neural networks have reached human, and sometimes super-human, levels of ability in tasks such as language translation, object recognition, game playing, and even driving cars. Aerospike is the global leader in next-generation, real-time NoSQL data solutions for any scale. Aerospike's patented Hybrid Memory Architecture delivers an unbreakable competitive advantage by unlocking the full potential of modern hardware, delivering previously unimaginable value from vast amounts of data at the edge, to the core and in the cloud. With this growth in capability has come a growth in complexity. Data scientists and machine learning engineers must perform feature engineering, design model architectures, and optimize hyperparameters. Since the purpose of the machine learning is to automate a task normally done by humans, naturally the next step is to automate the tasks of data scientists and engineers. This area of research is called automated machine learning, or AutoML. There have been many exciting developments in AutoML recently, and it's important to take a look at the current state of the art and learn about what's happening now and what's coming up in the future. InfoQ reached out to the following subject matter experts in the industry to discuss the current state and future trends in AutoML space. InfoQ: What is AutoML and why is it important?


State of the Art in Automated Machine Learning

#artificialintelligence

In recent years, machine learning has been very successful in solving a wide range of problems. In particular, neural networks have reached human, and sometimes super-human, levels of ability in tasks such as language translation, object recognition, game playing, and even driving cars. Prevent out-of-control infrastructure and remove blockers to deployments. With this growth in capability has come a growth in complexity. Data scientists and machine learning engineers must perform feature engineering, design model architectures, and optimize hyperparameters. Since the purpose of the machine learning is to automate a task normally done by humans, naturally the next step is to automate the tasks of data scientists and engineers. This area of research is called automated machine learning, or AutoML. There have been many exciting developments in AutoML recently, and it's important to take a look at the current state of the art and learn about what's happening now and what's coming up in the future. InfoQ reached out to the following subject matter experts in the industry to discuss the current state and future trends in AutoML space. InfoQ: What is AutoML and why is it important? Francesca Lazzeri: AutoML is the process of automating the time consuming, iterative tasks of machine learning model development, including model selection and hyperparameter tuning.


AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data

arXiv.org Machine Learning

We introduce AutoGluon-Tabular, an open-source AutoML framework that requires only a single line of Python to train highly accurate machine learning models on an unprocessed tabular dataset such as a CSV file. Unlike existing AutoML frameworks that primarily focus on model/hyperparameter selection, AutoGluon-Tabular succeeds by ensembling multiple models and stacking them in multiple layers. Experiments reveal that our multi-layer combination of many models offers better use of allocated training time than seeking out the best. A second contribution is an extensive evaluation of public and commercial AutoML platforms including TPOT, H2O, AutoWEKA, auto-sklearn, AutoGluon, and Google AutoML Tables. Tests on a suite of 50 classification and regression tasks from Kaggle and the OpenML AutoML Benchmark reveal that AutoGluon is faster, more robust, and much more accurate. We find that AutoGluon often even outperforms the best-in-hindsight combination of all of its competitors. In two popular Kaggle competitions, AutoGluon beat 99% of the participating data scientists after merely 4h of training on the raw data.


Amazon Gets Into the AutoML Race with AutoGluon: Some AutoML Architectures You Should Know About

#artificialintelligence

A few days ago, Amazon announced the release of AutoGloun, a new toolkit that simplifies the creation of deep learning models with just a few lines of code. The release marks Amazon's entrance in the ultra-competitive Automated machine learning(AutoML) space which is becoming one of the hottest trends for enterprise machine learning platforms. With some many news around the AutoML ecosystem, sometimes it becomes hard to differentiate signal from noise. Today, I would like to explore some of the most innovative AutoML stacks in the market that don't receive that much publicity. AutoML is becoming one of the most popular topics in modern data science applications.