In recent years, machine learning has been very successful in solving a wide range of problems. In particular, neural networks have reached human, and sometimes super-human, levels of ability in tasks such as language translation, object recognition, game playing, and even driving cars. Aerospike is the global leader in next-generation, real-time NoSQL data solutions for any scale. Aerospike's patented Hybrid Memory Architecture delivers an unbreakable competitive advantage by unlocking the full potential of modern hardware, delivering previously unimaginable value from vast amounts of data at the edge, to the core and in the cloud. With this growth in capability has come a growth in complexity. Data scientists and machine learning engineers must perform feature engineering, design model architectures, and optimize hyperparameters. Since the purpose of the machine learning is to automate a task normally done by humans, naturally the next step is to automate the tasks of data scientists and engineers. This area of research is called automated machine learning, or AutoML. There have been many exciting developments in AutoML recently, and it's important to take a look at the current state of the art and learn about what's happening now and what's coming up in the future. InfoQ reached out to the following subject matter experts in the industry to discuss the current state and future trends in AutoML space. InfoQ: What is AutoML and why is it important?
Automated machine learning (AutoML) is essentially automating the process of applying machine learning to real-world problems. The primary goals of AutoML tools are to provide methods and processes to make Machine Learning available for non-Machine Learning experts (domain experts), to improve efficiency of Machine Learning and to accelerate research on Machine Learning. Although automation and efficiency are some of AutoML's main selling points, the process still requires a surprising level of human involvement. A number of vital steps of the machine learning pipeline, including understanding the attributes of domain-specific data, defining prediction problems, creating a suitable training data set etc. still tend to be done manually by a data scientist on an ad-hoc basis. Often, this process requires a lot of back-and-forth between the data scientist and domain experts, making the whole process more difficult and inefficient. Altogether, AutoML systems are still far from a "real automatic system". In this review article, we present a level-wise taxonomic perspective on AutoML systems to-date and beyond, i.e., we introduce a new classification system with seven levels to distinguish AutoML systems based on their level of autonomy. We first start with a discussion on how an end-to-end Machine learning pipeline actually looks like and which sub-tasks of Machine learning Pipeline has indeed been automated so far. Next, we highlight the sub-tasks which are still done manually by a data-scientist in most cases and how that limits a domain expert's access to Machine learning. Then, we introduce the novel level-based taxonomy of AutoML systems and define each level according to their scope of automation support. Finally, we provide a road-map of future research endeavor in the area of AutoML and discuss some important challenges in achieving this ambitious goal.
A few days ago, Amazon announced the release of AutoGloun, a new toolkit that simplifies the creation of deep learning models with just a few lines of code. The release marks Amazon's entrance in the ultra-competitive Automated machine learning(AutoML) space which is becoming one of the hottest trends for enterprise machine learning platforms. With some many news around the AutoML ecosystem, sometimes it becomes hard to differentiate signal from noise. Today, I would like to explore some of the most innovative AutoML stacks in the market that don't receive that much publicity. AutoML is becoming one of the most popular topics in modern data science applications.
Automated machine learning(AutoML) is becoming one of the most popular topics in modern data science applications. Often, people see AutoML as a mechanism to use out-of-the-box machine learning models without the need of sophisticated data science knowledge. While theoretically, this argument makes sense the reality if a bit different. In the current stage of artificial intelligence(AI), most real world applications require some level of machine learning knowledge. The scenarios that you can solve with a vanilla API like the Watson Developer Cloud or Microsoft Cognitive Services are very basic and represent only a small percentage of the broader spectrum of machine learning scenarios.
Data Science (DS) and Machine Learning (ML) are the spines of today's data-driven business decision-making. From a human viewpoint, ML often consists of multiple phases: from gathering requirements and datasets to deploying a model, and to support human decision-making--we refer to these stages together as DS/ML Lifecycle. There are also various personas in the DS/ML team and these personas must coordinate across the lifecycle: stakeholders set requirements, data scientists define a plan, and data engineers and ML engineers support with data cleaning and model building. Later, stakeholders verify the model, and domain experts use model inferences in decision making, and so on. Throughout the lifecycle, refinements may be performed at various stages, as needed. It is such a complex and time-consuming activity that there are not enough DS/ML professionals to fill the job demands, and as much as 80% of their time is spent on low-level activities such as tweaking data or trying out various algorithmic options and model tuning. These two challenges -- the dearth of data scientists, and time-consuming low-level activities -- have stimulated AI researchers and system builders to explore an automated solution for DS/ML work: Automated Data Science (AutoML). Several AutoML algorithms and systems have been built to automate the various stages of the DS/ML lifecycle. For example, the ETL (extract/transform/load) task has been applied to the data readiness, pre-processing & cleaning stage, and has attracted research attention.