Goto

Collaborating Authors

How to version control your production machine learning models Algorithmia Blog

#artificialintelligence

Machine learning is about rapid experimentation and iteration, and without keeping track of your modeling history you won't be able to learn much. Versioning lets you keep track of all of your models, how well they've done, and what hyperparameters you used to get there. This post will walk through why data versioning is important, tools to get it done with, and how to version your models that go into production. If you've spent time working with Machine Learning, one thing is clear: it's an iterative process. There are so many different parts of your model--how you use your data, hyperparameters, parameters, algorithm choice, architecture--and the optimal combination of all of those is the holy grail of machine learning.


How to version control your production Machine Learning models - Algorithmia Blog

#artificialintelligence

Machine Learning is about rapid experimentation and iteration, and without keeping track of your modeling history you won't be able to learn much. Versioning let's you keep track of all of your models, how well they've done, and what hyperparameters you used to get there. This post will walk through why versioning is important, tools to get it done with, and how to version your models that go into production. If you've spent time working with Machine Learning, one thing is clear: it's an iterative process. There are so many different parts of your model – how you use your data, hyperparameters, parameters, algorithm choice, architecture – and the optimal combination of all of those is the holy grail of Machine Learning.


Version Control for Data Science: Tracking Machine Learning Models and Datasets

#artificialintelligence

Undoubtedly, GIT is the holy grail of versioning systems! Git is great in versioning the source code. But unlike software engineering, Data Science projects have additional big-ass files like datasets, trained model files, label-encodings etc. which can easily go to the size of a few GBs and therefore cannot be tracked using GIT. DVC helps us to version large data files, similar to how we version control source code files using git. Also, DVC works flawlessly on top of GIT which makes it even better!


Creating a solid Data Science development environment

#artificialintelligence

Conda is an environment and package manager that can replace pipenv and pip in Python. It is part of Anaconda, a Python (and R) distribution focused on Data Science. You can choose to install the full version (Anaconda, around 3GB) or the light version (Miniconda, around 400MB). I recommend using Miniconda as you'll only install the libraries you need. For a broader review, please take a look on Gergely Szerovay's article on Conda.


r/MachineLearning - [R] Machine Learning Reproducibility Challenges and DVC

#artificialintelligence

When ML models need to be regularly updated in production, a host of challenges emerges. No one tool can do it all for you - organizations using a mix of Git, Makefiles, ad hoc scripts and reference files for reproducibility.