Data scientists who work within the R environment can now partake of MLflow, the open source project that Databricks released earlier this year to help manage workflows associated with machine learning development and production lifecycles. In June, Databricks co-founder and CTO Matei Zaharia unveiled MLflow as a way to automate much of the work that data scientists do when building, testing, and deploying machine learning models. The open source software was designed to fill in the gaps between the various tools, frameworks, and processes when building machine learning systems, including tracking code, packaging models, and deploying them into production. According to Databricks, MLflow allows users to package their code as reproducible runs, execute and compare hundreds of parallel experiments, on any hardware or software platform, including on premise and cloud based environments. Assistance with hyperparameter tuning is also provided.
Gartner has released its 2020 Data Science and Machine Learning Platforms Magic Quadrant, and we are excited to announce that Databricks has been recognized as a Leader. Gartner evaluated 17 vendors for their completeness of vision and ability to execute. We are confident the following attributes contributed to the company's success: The biggest advantage of Databricks' Unified Data Analytics Platform is its ability to run data processing and machine learning workloads at scale and all in one place. Customers praise Databricks for significantly reducing TCO and accelerating time to value, thanks to its seamless end-to-end integration of everything from ETL to exploratory data science to production machine learning. With Databricks, data teams can build reliable data pipelines with Delta Lake, which adds reliability and performance to existing data lakes.
AMSTERDAM and SAN FRANCISCO, Oct. 16, 2019 – Databricks, the leader in unified data analytics, today announced Model Registry, a new capability within MLflow, an open-source platform for the machine learning (ML) lifecycle created by Databricks. The new component enables a comprehensive model management process by providing data scientists and engineers a central repository to track, share, and collaborate on machine learning models. The Model Registry manages the full lifecycle of models and their stage transitions from experimentation to staging and deployment. Since introducing MLflow at Spark AI Summit 2018, the project has more than 140 contributors and 800,000 monthly downloads making it the leader in ML lifecycle management. "Everyone who has tried to do machine learning development knows that it is complex. The ability to manage, version and share models is critical to minimizing confusion as the number of models in experimentation, testing and production phases at any given time can span into the thousands," said Matei Zaharia, co-founder and CTO at Databricks.
Databricks today unveiled MLflow, a new open source project that aims to provide some standardization to the complex processes that data scientists oversee during the course of building, testing, and deploying machine learning models. "Everybody who has done machine learning knows that the machine learning development lifecycle is very complex," Apache Spark creator and Databricks CTO Matei Zaharia said during his keynote address at Databricks' Spark and AI Summit in San Francisco. "There are a lot of issues that come up that you don't have in normal software development lifecycle." The vast volumes of data, together with the abundance of machine learning frameworks, the large scale of production systems, and the distributed nature of data science and engineering teams, combine to provide a huge number of variables to control in the machine learning DevOps lifecycle -- and that even before the tuning. "They have all these tuning parameters that you have to change and explore to get a good model," Zaharia said.
American upstart Databricks, established by the original authors of the Apache Spark framework, reckons its open-source machine-learning management engine MLflow is ready for prime time. The released version 1.0 of the platform focuses on core API components. It improves the handling of metrics and search functionality, and adds support for Hadoop as an artifact store, in addition to the previously supported Amazon S3, Azure Blob Storage, Google Cloud Storage, SFTP, and NFS. It also adds an experimental Open Neural Network Exchange (ONNX) model flavour, and a CLI command for building a Docker image capable of serving an MLflow model. And finally, there's Windows support for the MLflow client – in the unlikely event data scientists decide to opt for something other than Linux.