Check out the "Model lifecycle management" sessions at the Strata Data Conference in New York, September 11-13, 2018. Hurry--early price ends July 27. Although machine learning (ML) can produce fantastic results, using it in practice is complex. Beyond the usual challenges in software development, machine learning developers face new challenges, including experiment management (tracking which parameters, code, and data went into a result); reproducibility (running the same code and environment later); model deployment into production; and governance (auditing models and data used throughout an organization). These workflow challenges around the ML lifecycle are often the top obstacle to using ML in production and scaling it up within an organization.
Everyone who has tried to do machine learning development knows that it is complex. Beyond the usual concerns in the software development, machine learning (ML) development comes with multiple new challenges. It's hard to track experiments. Machine learning algorithms have dozens of configurable parameters, and whether you work alone or on a team, it is difficult to track which parameters, code, and data went into each experiment to produce a model. It's hard to reproduce results.
Databricks, the Silicon Valley-based startup focused on commercializing Apache Spark, has developed MLflow, an open source toolkit for data scientists to manage the lifecycle of machine learning models. Unlike traditional software development, machine learning relies on a plethora of tools. For each stage involved in building a model, data scientists use at least half-a-dozen tools. Each stage requires extensive experimentation before settling for the right toolkit and framework. The fragmentation of tools combined with the need to rapidly iterate makes machine learning extremely complex.
Machine learning is one of the most popular technologies of this decade. But, along with the growing acceptance and adoption of ML, the complexity involved in managing ML projects is also increasing proportionally. Unlike traditional software development, ML is all about experimentation. For each stage of the ML pipeline, there is a plethora of tools and open source projects available. The training process, hyperparameter tuning, scoring, and evaluation of a model are often repeated until the results are satisfying.