Goto

Collaborating Authors

 ploomber


ML Without the Ops: Running Experiments at Scale with Ploomber on AWS

#artificialintelligence

For the past couple of months, we've chatted with many Data Science and Machine Learning teams to understand their pain points. Of course, there are many of these. Still, the one that surprised me the most is how hard it is to get some simple end-to-end workflow working, partially because vendors often lock teams into complicated solutions that require a lot of setup and maintenance. This blog post will describe a simple architecture that you can use to start building data pipelines in the cloud without sacrificing your favorite tooling or recurring high maintenance costs. The solution involves using our open-source frameworks and AWS Batch.


Effective Testing for Machine Learning (Part I)

#artificialintelligence

Update: Part II is out now! This blog post series describes a strategy I've developed over the last couple of years to test Machine Learning projects effectively. Given how uncertain ML projects are, this is an incremental strategy that you can adopt as your project matures; it includes test examples to provide a clear idea of how these tests look in practice, and a complete project implemented with Ploomber is available on GitHub. By the end of the post, you'll be able to develop more robust ML pipelines. Testing Machine Learning projects is challenging. Training a model is a long-running task that may take hours to run and has a non-deterministic output, which is the opposite we need to test software: quick and deterministic procedures.


Effective Testing for Machine Learning (Part I)

#artificialintelligence

This blog post series describes a strategy I've developed over the last couple of years to test Machine Learning projects effectively. Given how uncertain ML projects are, this is an incremental strategy that you can adopt as your project matures; it includes test examples to provide a clear idea of how these tests look in practice, and a complete project implemented with Ploomber is available on GitHub. By the end of the post, you'll be able to develop more robust ML pipelines. Testing Machine Learning projects is challenging. Training a model is a long-running task that may take hours to run and has a non-deterministic output, which is the opposite we need to test software: quick and deterministic procedures. One year ago, I published a post on testing data-intensive projects to make Continuous Integration feasible.


Notebook meta-analysis: Jupyter as a zero-infrastructure alternative to experiment trackers

#artificialintelligence

Existing experiment trackers come with a high setup cost. To get one working, you usually have to spin up a database and run a web application. After trying multiple options, I thought that using Jupyter notebooks could be an excellent choice to store experiment results and retrieve them for comparison. This post explains how I use .ipynb Machine Learning is a highly iterative process: you don't know in advance what combination of model, features, and hyperparameters will work best, so you need to make slight tweaks and evaluate performance.


More ODSC West 2021 Speakers Added to the Already Expert Lineup

#artificialintelligence

We can't wait for you to join us at ODSC West 2021 -- our first in-person event in 2 years. This November 16th -- 18th we'll be gathering together data scientists and speakers from around the country for three days of applied instruction. Check out below for information on some of the sessions from ODSC West 2021 speakers that you can look forward to. Learn to build more efficient models by tracking data and code changes, as well as, changes in the hyperparameter values. In this workshop, you'll use the open-source tool, DVC, to increase reproducibility for two methods of tuning hyperparameters: grid search and random search.