From Jupyter Notebook to DVC pipeline for reproducible ML experiments
While every data scientist has their own methods and approaches to conducting data science, there is one tool that nearly everyone in the field uses: Jupyter Notebook. Its ease of use makes it the perfect tool for prototyping, usually resulting in a script in which we preprocess the data, do a train/test split, train our model, and evaluate it. However, once we have a decent prototype, the subsequent iterations generally don't touch most of the code. Instead, we tend to focus on tweaking feature engineering parameters and tuning model hyperparameters. At this point, we really start experimenting, trying to answer questions such as "What happens if I increase the learning rate?" and "What's the optimal batch size?"
Nov-29-2022, 14:41:46 GMT
- Technology: