Data Version Control: iterative machine learning
It is hardly possible in real life to develop a good machine learning model in a single pass. ML modeling is an iterative process and it is extremely important to keep track of your steps, dependencies between the steps, dependencies between your code and data files and all code running arguments. This becomes even more important and complicated in a team environment where data scientists' collaboration takes a serious amount of the team's effort. Today, we are pleased to announce the beta version release of new open source tool -- data version control or DVC. DVC is designed to help data scientists keep track of their ML processes and file dependencies in the simple form of git-like commands: "dvc run python train_model.py Your existing ML processes can be easily transformed into reproducible DVC pipelines regardless of which programming language or tool was used.
May-12-2017, 22:45:35 GMT
- Technology: