Data Version Control: iterative machine learning

@machinelearnbot 

ML modeling is an iterative process and it is extremely important to keep track of your steps, dependencies between the steps, dependencies between your code and data files and all code running arguments. DVC is designed to help data scientists keep track of their ML processes and file dependencies in the simple form of git-like commands: "dvc run python train_model.py Your existing ML processes can be easily transformed into reproducible DVC pipelines regardless of which programming language or tool was used. This blog post walks you through an iterative process of building a machine learning model with DVC using stackoverflow posts dataset. Thus, the model can be improved iteratively and DVC simplifies the iterative ML process and aids collaboration between data scientists.