Collaborating Authors


Training Machine Learning Models On 311, 511, and 911 City Data -


We have been working hard to understand the core stack of data services that make our cities work, or not work, depending on where you live. This is the current data sets available via existing services, which may or may not exist in a machine readable format, via an API, depending on the city you live in. There is a huge amount of data already available at the municipal level, but here is where we have started as of January. Real Time Streaming 311 Incidents In Chicago 511 - Traffic, Travel & Transit Adding 511 Data To Our Existing Transit Data Research Getting Your 511 Traffic Incidents in the San Francisco Bay Area as a Real Time Streaming API 911 - Emergency Events Making 911 Data Real Time Streaming 911 Emergency Data For Baltimore, MD We've targeted these three areas because they make a difference in our lives at the local level, and have huge potential when it comes to making available via web APIs, and in real time using Server-Sent Events (SSE). Now that we have these three critical aspects of municipal operations profiled, we are going to work to profile as many cities as we can.

Lessons from 2 Million Machine Learning Models on Kaggle


Lessons from Kaggle competitions, including why XG Boosting is the top method for structured problems, Neural Networks and deep learning dominate unstructured problems (visuals, text, sound), and 2 types of problems for which Kaggle is suitable. Here is a summary of Anthony Goldbloom presentation at the Data Science Chicago Meetup, Nov 2 2015. Nice to see Anthony coming from financial statistics/econometrics (he mentioned his first job was with the Reserve Bank of Australia).