Databricks, a leader in unified analytics and founded by the original creators of Apache Spark, and RStudio, today announced a new release of MLflow, an open source multi-cloud framework for the machine learning lifecycle, now with R integration. This new integration adds to features that have already been released, making MLflow the most comprehensive open source machine learning platform, with support for multiple programming languages, integrations with popular machine learning libraries, and support for multiple clouds. Previous to MLflow, the industry did not have a standard process or end-to-end infrastructure to develop and productionize machine learning applications in a simple and consistent way. With MLflow, organizations can package their code as reproducible runs, execute and compare hundreds of parallel experiments, leverage any hardware or software platform for training, tuning, hyperparameter search and more. Additionally, organizations can deploy and manage models in production on a variety of clouds and serving platforms.
Databricks today unveiled MLflow, a new open source project that aims to provide some standardization to the complex processes that data scientists oversee during the course of building, testing, and deploying machine learning models. "Everybody who has done machine learning knows that the machine learning development lifecycle is very complex," Apache Spark creator and Databricks CTO Matei Zaharia said during his keynote address at Databricks' Spark and AI Summit in San Francisco. "There are a lot of issues that come up that you don't have in normal software development lifecycle." The vast volumes of data, together with the abundance of machine learning frameworks, the large scale of production systems, and the distributed nature of data science and engineering teams, combine to provide a huge number of variables to control in the machine learning DevOps lifecycle -- and that even before the tuning. "They have all these tuning parameters that you have to change and explore to get a good model," Zaharia said.
AMSTERDAM and SAN FRANCISCO, Oct. 16, 2019 – Databricks, the leader in unified data analytics, today announced Model Registry, a new capability within MLflow, an open-source platform for the machine learning (ML) lifecycle created by Databricks. The new component enables a comprehensive model management process by providing data scientists and engineers a central repository to track, share, and collaborate on machine learning models. The Model Registry manages the full lifecycle of models and their stage transitions from experimentation to staging and deployment. Since introducing MLflow at Spark AI Summit 2018, the project has more than 140 contributors and 800,000 monthly downloads making it the leader in ML lifecycle management. "Everyone who has tried to do machine learning development knows that it is complex. The ability to manage, version and share models is critical to minimizing confusion as the number of models in experimentation, testing and production phases at any given time can span into the thousands," said Matei Zaharia, co-founder and CTO at Databricks.
MLflow, the open source machine learning operations (MLOps) platform created by Databricks, is becoming a Linux Foundation project. The move was announced by Matei Zaharia, co-founder of Databricks, and creator of both MLflow and Apache Spark, at the company's Spark AI Summit virtual event today. In a pre-briefing with ZDNet earlier in the week, Zaharia provided an update on MLflow's momentum, details on the new features and reasoning for moving management of the open source project from Databricks to the Linux Foundation. Momentum-wise, Zaharia said MLflow has been experiencing a 4x year-over-year growth rate. On the Databricks platform alone (including both the Amazon Web Services and Microsoft Azure offerings of the service), Zaharia said the more than 1M experiment runs are run on MLflow, and more than 100,000 ML models are added to its model registry, *each week*.
Databricks, the Silicon Valley-based startup focused on commercializing Apache Spark, has developed MLflow, an open source toolkit for data scientists to manage the lifecycle of machine learning models. Unlike traditional software development, machine learning relies on a plethora of tools. For each stage involved in building a model, data scientists use at least half-a-dozen tools. Each stage requires extensive experimentation before settling for the right toolkit and framework. The fragmentation of tools combined with the need to rapidly iterate makes machine learning extremely complex.