Collaborating Authors


Databricks Simplifies Machine Learning Model Management at Scale with MLflow Model Registry


AMSTERDAM and SAN FRANCISCO, Oct. 16, 2019 – Databricks, the leader in unified data analytics, today announced Model Registry, a new capability within MLflow, an open-source platform for the machine learning (ML) lifecycle created by Databricks. The new component enables a comprehensive model management process by providing data scientists and engineers a central repository to track, share, and collaborate on machine learning models. The Model Registry manages the full lifecycle of models and their stage transitions from experimentation to staging and deployment. Since introducing MLflow at Spark AI Summit 2018, the project has more than 140 contributors and 800,000 monthly downloads making it the leader in ML lifecycle management. "Everyone who has tried to do machine learning development knows that it is complex. The ability to manage, version and share models is critical to minimizing confusion as the number of models in experimentation, testing and production phases at any given time can span into the thousands," said Matei Zaharia, co-founder and CTO at Databricks.

Open-source tools to version control Machine Learning models and experiments


AI and ML are becoming an essential part of the engineering and data science everyday workflow. ML teams need new tools for data versioning, ML pipeline versioning, experiments metrics visualization and others. Do you have the tools to successfully version data and ML pipelines, visualize experiments, and more? Come and join us for a discussion on the best ML practices! In this talk, Dmitry Petrov will discuss: - The current practices of organizing ML projects using open-source tools like Git, MLflow, and

Training Machine Learning Models with MongoDB


Over the last four months, I attended an immersive data science program at Galvanize in San Francisco. As a graduation requirement, the last three weeks of the program are reserved for a student-selected project that puts to use the skills learned throughout the course. The project that I chose to tackle utilized natural language processing in tandem with sentiment analysis to parse and classify news articles. With the controversy surrounding our nation's media and the concept of "fake news" floated around every corner, I decided to take a pragmatic approach to address bias in the media. My resulting model identified three topics within an article and classified the sentiments towards each topic.

Training Machine Learning Models On 311, 511, and 911 City Data -


We have been working hard to understand the core stack of data services that make our cities work, or not work, depending on where you live. This is the current data sets available via existing services, which may or may not exist in a machine readable format, via an API, depending on the city you live in. There is a huge amount of data already available at the municipal level, but here is where we have started as of January. Real Time Streaming 311 Incidents In Chicago 511 - Traffic, Travel & Transit Adding 511 Data To Our Existing Transit Data Research Getting Your 511 Traffic Incidents in the San Francisco Bay Area as a Real Time Streaming API 911 - Emergency Events Making 911 Data Real Time Streaming 911 Emergency Data For Baltimore, MD We've targeted these three areas because they make a difference in our lives at the local level, and have huge potential when it comes to making available via web APIs, and in real time using Server-Sent Events (SSE). Now that we have these three critical aspects of municipal operations profiled, we are going to work to profile as many cities as we can.