Data scientists who work within the R environment can now partake of MLflow, the open source project that Databricks released earlier this year to help manage workflows associated with machine learning development and production lifecycles. In June, Databricks co-founder and CTO Matei Zaharia unveiled MLflow as a way to automate much of the work that data scientists do when building, testing, and deploying machine learning models. The open source software was designed to fill in the gaps between the various tools, frameworks, and processes when building machine learning systems, including tracking code, packaging models, and deploying them into production. According to Databricks, MLflow allows users to package their code as reproducible runs, execute and compare hundreds of parallel experiments, on any hardware or software platform, including on premise and cloud based environments. Assistance with hyperparameter tuning is also provided.
MLflow, the open source machine learning operations (MLOps) platform created by Databricks, is becoming a Linux Foundation project. The move was announced by Matei Zaharia, co-founder of Databricks, and creator of both MLflow and Apache Spark, at the company's Spark AI Summit virtual event today. In a pre-briefing with ZDNet earlier in the week, Zaharia provided an update on MLflow's momentum, details on the new features and reasoning for moving management of the open source project from Databricks to the Linux Foundation. Momentum-wise, Zaharia said MLflow has been experiencing a 4x year-over-year growth rate. On the Databricks platform alone (including both the Amazon Web Services and Microsoft Azure offerings of the service), Zaharia said the more than 1M experiment runs are run on MLflow, and more than 100,000 ML models are added to its model registry, *each week*.
Databricks, a leader in unified analytics and founded by the original creators of Apache Spark, and RStudio, today announced a new release of MLflow, an open source multi-cloud framework for the machine learning lifecycle, now with R integration. This new integration adds to features that have already been released, making MLflow the most comprehensive open source machine learning platform, with support for multiple programming languages, integrations with popular machine learning libraries, and support for multiple clouds. Previous to MLflow, the industry did not have a standard process or end-to-end infrastructure to develop and productionize machine learning applications in a simple and consistent way. With MLflow, organizations can package their code as reproducible runs, execute and compare hundreds of parallel experiments, leverage any hardware or software platform for training, tuning, hyperparameter search and more. Additionally, organizations can deploy and manage models in production on a variety of clouds and serving platforms.
We called it Machine Learning October Fest. Last week saw the nearly synchronized breakout of a number of news centered around machine learning (ML): The release of PyTorch 1.0 beta from Facebook, fast.ai, Not accidentally, last week was also the time when Spark and AI Summit Europe took place. Its title this year has been expanded to include AI, attracting a lot of attention in the ML community. Apparently, it also works as a date around which ML announcements are scheduled.
AMSTERDAM and SAN FRANCISCO, Oct. 16, 2019 – Databricks, the leader in unified data analytics, today announced Model Registry, a new capability within MLflow, an open-source platform for the machine learning (ML) lifecycle created by Databricks. The new component enables a comprehensive model management process by providing data scientists and engineers a central repository to track, share, and collaborate on machine learning models. The Model Registry manages the full lifecycle of models and their stage transitions from experimentation to staging and deployment. Since introducing MLflow at Spark AI Summit 2018, the project has more than 140 contributors and 800,000 monthly downloads making it the leader in ML lifecycle management. "Everyone who has tried to do machine learning development knows that it is complex. The ability to manage, version and share models is critical to minimizing confusion as the number of models in experimentation, testing and production phases at any given time can span into the thousands," said Matei Zaharia, co-founder and CTO at Databricks.