Using Microsoft R Server on a Single Machine for Experiments With 600M Taxi Rides

#artificialintelligence 

The New York City taxi dataset is one of the largest publicly available datasets, with information about 1.1 billion NYC taxi rides. This dataset has been explored and visualized in a number of blog posts, using a variety of techniques and technologies (e.g., PostgreSQL, Apache Elastic Search). A recent blog post showed how to build ML models over one years' worth of this dataset using MRS running in a 4-node Hadoop cluster. In a new blog post, Microsoft Data Scientist Dmitry Pechyoni shows us how to build a binary classification model that will predict if a passenger will pay a tip. Dmitry was able to use Microsoft R Server (MRS) to drive the entire process of building and evaluating machine learning models over hundreds of millions of examples using a single commodity machine.