Databricks today unveiled MLflow, a new open source project that aims to provide some standardization to the complex processes that data scientists oversee during the course of building, testing, and deploying machine learning models. "Everybody who has done machine learning knows that the machine learning development lifecycle is very complex," Apache Spark creator and Databricks CTO Matei Zaharia said during his keynote address at Databricks' Spark and AI Summit in San Francisco. "There are a lot of issues that come up that you don't have in normal software development lifecycle." The vast volumes of data, together with the abundance of machine learning frameworks, the large scale of production systems, and the distributed nature of data science and engineering teams, combine to provide a huge number of variables to control in the machine learning DevOps lifecycle -- and that even before the tuning. "They have all these tuning parameters that you have to change and explore to get a good model," Zaharia said.
Databricks, a leader in unified analytics and founded by the original creators of Apache Spark, and RStudio, today announced a new release of MLflow, an open source multi-cloud framework for the machine learning lifecycle, now with R integration. This new integration adds to features that have already been released, making MLflow the most comprehensive open source machine learning platform, with support for multiple programming languages, integrations with popular machine learning libraries, and support for multiple clouds. Previous to MLflow, the industry did not have a standard process or end-to-end infrastructure to develop and productionize machine learning applications in a simple and consistent way. With MLflow, organizations can package their code as reproducible runs, execute and compare hundreds of parallel experiments, leverage any hardware or software platform for training, tuning, hyperparameter search and more. Additionally, organizations can deploy and manage models in production on a variety of clouds and serving platforms.
Successfully building and deploying a machine-learning model can be difficult to do once. Enabling other data scientists (or yourself) to reproduce your pipeline, compare the results of different versions, track what's running where, and redeploy and rollback updated models, is much harder. In this eBook, we'll explore in greater depth what makes the ML lifecycle so challenging compared to the traditional software-development lifecycle, and share the Databricks approach to addressing these challenges. Key challenges faced by organizations when managing ML models throughout their lifecycle and how to overcome them. How MLflow, an open source framework unveiled by Databricks, can help address these challenges, specifically around experiment tracking, project reproducibility, and model deployment.
Machine learning brings new complexities beyond the traditional software development lifecycle. To address these challenges, Databricks unveiled MLflow, an open source project aimed at simplifying the entire machine learning lifecycle. MLflow allows companies of all sizes to accelerate the machine learning lifecycle by introducing simple abstractions to package reproducible projects, track results, and encapsulate models. Keep track of experiment runs and results across frameworks. Execute projects remotely on to a Databricks cluster, and quickly reproduce your runs.
In recent years, machine learning has become ubiquitous in industry and production environments. Both academic and industry institutions had previously focused on training and producing models, but the focus has shifted to productionizing the trained models. Now we hear more and more machine learning practitioners really trying to find the right model deployment options. In most scenarios, deployment means shipping the trained models to some system that makes predictions based on unseen real-time or batch data, and serving those predictions to some end user, again in real-time or in batches. This is easier said than done.