In my previous blog post „how to manage machine learning models" I explained the difficulties within the process of developing a good machine learning model and motivated using a tool to support data scientists with this challenge. First there will be one paragraph per framework that describes the project and shows some code examples. In the end of the article you will find a framework comparison and recommendations when to use which framework. As with my previous post the sklearn dataset on Boston-Housing prices will be used as basis. You can find a notebook to play with in this github repo. This notebook also includes instructions how to install the frameworks as well as some other functions we will use within the code examples below, but that won't be discussed further, to place focus on the framework specific parts and omit boilerplate code. DVC means „data (science) version control" and aims to do for data science what git already does for software development: Making development processes traceable and reproducible.
The last two layers are fully-connected and a dropout is applied on each of them. Note that the dropout probability for the layers is passed into the network's constructor. This enables us to flexibly pass different values for this hyperparameter, as we will see in a bit. The final layer contains a logarithmic softmax function which gives us the probability for each of the 10 digit classes from MNIST. The digit with the highest probability is the one our network thinks is most likely visible on the input picture.
In February 2016, we introduced Databricks Community Edition, a free edition for big data developers to learn and get started quickly with Apache Spark. Since then our commitment to foster a community of developers remains steadfast: to date, we have over 150K registered Community Edition users; we have trained thousands of people at meetups and Spark AI Summits, and other open-source events. Today, we are excited to extend Databricks Community Edition with hosted MLflow for free, as part of our ongoing commitment to help developers learn about machine learning lifecycle. With the Community Edition, you can try tutorials that demonstrate how to track results and experiments as you build machine learning models--a crucial stage in the machine learning model's development lifecycle. MLflow is an open-source platform for the machine learning lifecycle with four components: MLflow Tracking, MLflow Projects, MLflow Models, and MLflow Registry.
Sharing your work can be just as easy as me sharing this MLflow experiment run with you. MLFlow is an awesome open-source project that has been gaining a lot of popularity lately. If you haven't heard of MLFlow yet, you should definitely check it out. This blog post, written by the authors of mlflow, should give you a pretty good picture. Long story short, mflow has the potential to become an industry standard when it comes to tracking, reproducibility, and deployment of machine learning models.