mlbox
eTOP: Early Termination of Pipelines for Faster Training of AutoML Systems
Zhang, Haoxiang, Freire, Juliana, Garg, Yash
Recent advancements in software and hardware technologies have enabled the use of AI/ML models in everyday applications has significantly improved the quality of service rendered. However, for a given application, finding the right AI/ML model is a complex and costly process, that involves the generation, training, and evaluation of multiple interlinked steps (called pipelines), such as data pre-processing, feature engineering, selection, and model tuning. These pipelines are complex (in structure) and costly (both in compute resource and time) to execute end-to-end, with a hyper-parameter associated with each step. AutoML systems automate the search of these hyper-parameters but are slow, as they rely on optimizing the pipeline's end output. We propose the eTOP Framework which works on top of any AutoML system and decides whether or not to execute the pipeline to the end or terminate at an intermediate step. Experimental evaluation on 26 benchmark datasets and integration of eTOPwith MLBox4 reduces the training time of the AutoML system upto 40x than baseline MLBox.
Best ML Github Repos To Checkout
Once in a while every data scientist needs some inspiration. Or maybe you just want to learn new things or to see what's going on in the awesome field called Machine Learning. On Github there's a lot of brilliant and well crafted ML repos. Here's just a fraction of what Github has to offer. I hope you will enjoy it and let's get started!
Introduction to AutoML with MLBox
Today's post is very special. It's written in collaboration with Axel de Romblay the author of the MLBox Auto-ML package that has gained a lot of popularity these last years. If you haven't heard about this library, go and check it out on github: It encompasses interesting features, it's gaining in maturity and is now under active development. In this post, we'll show you how you can easily use it to train an automated machine learning pipeline for a classification problem. It'll start off by loading and cleaning the data, removing drift, launching a strong pipeline of accelerated optimization and generating predictions.
11 most read Machine Learning articles from Analytics Vidhya in 2017 - Analytics Vidhya
These curated articles will be a one stop solution for people who are getting started with Machine Learning or who already have. This article contains all the best articles of 2017 which gathered the interest of the Machine Learning community. Similar to the previous article on -"Best Deep Learning articles in 2017", I have added the used tool and the level of difficulty for each article to facilitate you with the choice. If you wish to include any other learning resource/article here, please mention them in the comments. A large amount of unstructured data present today is in the form of text, for example: Medical documents, legal agreements, tweets, blogs, newspapers, chat conversions etc.
Tutorial on Automated Machine Learning using MLBox
Recently, one of my friends and I were solving a practice problem. After 8 hours of hard work & coding, my friend Shubham got a score of 1153 (position 219). How did I get there? What if I tell you there exists a library called MLBox, which does most of the heavy lifting in machine learning for you in minimal lines of code? From missing value imputation to feature engineering using state-of-the-art Entity Embeddings for categorical features, MLBox has it all.
Tutorial on Automated Machine Learning using MLBox
Recently, one of my friends and I were solving a practice problem. After 8 hours of hard work & coding, my friend Shubham got a score of 1153 (position 219). How did I get there? What if I tell you there exists a library called MLBox, which does most of the heavy lifting in machine learning for you in minimal lines of code? From missing value imputation to feature engineering using state-of-the-art Entity Embeddings for categorical features, MLBox has it all.