Databricks today unveiled MLflow, a new open source project that aims to provide some standardization to the complex processes that data scientists oversee during the course of building, testing, and deploying machine learning models. "Everybody who has done machine learning knows that the machine learning development lifecycle is very complex," Apache Spark creator and Databricks CTO Matei Zaharia said during his keynote address at Databricks' Spark and AI Summit in San Francisco. "There are a lot of issues that come up that you don't have in normal software development lifecycle." The vast volumes of data, together with the abundance of machine learning frameworks, the large scale of production systems, and the distributed nature of data science and engineering teams, combine to provide a huge number of variables to control in the machine learning DevOps lifecycle -- and that even before the tuning. "They have all these tuning parameters that you have to change and explore to get a good model," Zaharia said.
Data scientists who work within the R environment can now partake of MLflow, the open source project that Databricks released earlier this year to help manage workflows associated with machine learning development and production lifecycles. In June, Databricks co-founder and CTO Matei Zaharia unveiled MLflow as a way to automate much of the work that data scientists do when building, testing, and deploying machine learning models. The open source software was designed to fill in the gaps between the various tools, frameworks, and processes when building machine learning systems, including tracking code, packaging models, and deploying them into production. According to Databricks, MLflow allows users to package their code as reproducible runs, execute and compare hundreds of parallel experiments, on any hardware or software platform, including on premise and cloud based environments. Assistance with hyperparameter tuning is also provided.
We called it Machine Learning October Fest. Last week saw the nearly synchronized breakout of a number of news centered around machine learning (ML): The release of PyTorch 1.0 beta from Facebook, fast.ai, Not accidentally, last week was also the time when Spark and AI Summit Europe took place. Its title this year has been expanded to include AI, attracting a lot of attention in the ML community. Apparently, it also works as a date around which ML announcements are scheduled.
Prescient are the entrepreneurs who predicted data would become the new oil, like Ali Ghodsi, Andy Konwinski, Ion Stoica, Matei Zaharia, Patrick Wendell, Reynold Xin, and Scott Shenker. They're the cofounders of Databricks, a San Francisco-based company that provides a suite of enterprise-focused scalable data science and data engineering tools. Since 2013, the year Databricks opened for business, it's had no trouble attracting customers. But this week kicked into high gear the company's uninterrupted march toward market domination. Databricks this morning announced that it's closed a $400 million series F fundraising round led by Andreessen Horowitz with participation from Microsoft, Alkeon Capital Management, BlackRock, Coatue Management, Dragoneer Investment Group, Geodesic, Green Bay Ventures, New Enterprise Associates, T. Rowe Price, and Tiger Global Management.
MLflow, the open source machine learning operations (MLOps) platform created by Databricks, is becoming a Linux Foundation project. The move was announced by Matei Zaharia, co-founder of Databricks, and creator of both MLflow and Apache Spark, at the company's Spark AI Summit virtual event today. In a pre-briefing with ZDNet earlier in the week, Zaharia provided an update on MLflow's momentum, details on the new features and reasoning for moving management of the open source project from Databricks to the Linux Foundation. Momentum-wise, Zaharia said MLflow has been experiencing a 4x year-over-year growth rate. On the Databricks platform alone (including both the Amazon Web Services and Microsoft Azure offerings of the service), Zaharia said the more than 1M experiment runs are run on MLflow, and more than 100,000 ML models are added to its model registry, *each week*.