machine learning data science dojo
Seattle Meetup Experiment Management for Machine Learning Data Science Dojo
There is a lot that is involved with creating and running experiments, but the only thing that we seem to be equipped to keep track of is the source code of the best performing experiments, and none of the other configuration parameters that actually constitute an experiment. "It was working yesterday" – highlighting the commonality in reproducibility of experiment "I don't remember what the actual scores are, but using feature X didn't help" – documentation issue "I fixed a bug, but I ran so many previous experiments with that bug" – code dependency issue "I am using the same parameters as experiment 4, why is it not working" – reproducibility and documentation issue In this talk I will go through the typical process that ML practitioners and data scientists follow, taking python and scikit-learn as a use case, and the recurring issues that we are starting to see with these processes. I will describe the best practices to follow to help document experiments to help reproducibility, and tools and startups that are working on this space to fix the gaping issues that we have for experiment management. Presenter Bio: Dr. Rutu Mulkar is the founder of Hunchera, and previously the founder of Ticary Solutions (acquired by Sigmoidal). She received her Ph.D. in Natural Language Processing from USC and has contributed to IBM's Watson system that defeated humans in Jeopardy!
Create Custom R Models in Azure Machine Learning Data Science Dojo
Here is where we take advantage of AzureMl's newest feature: the Create R Model module. Now we can use R's randomForest library and take advantage of its large number of adjustable parameters directly inside AzureML studio. Then, the model can be deployed in a web service. Previously, R models were nearly impossible to deploy to the web. For a detailed explanation of setting up data partitions and model training checkout our other tutorial here.