You can find links to all of the posts in the introduction. Building data pipelines is a core component of data science at a startup. In order to build data products, you need to be able to collect data points from millions of users and process the results in near real-time. While my previous blog post discussed what type of data to collect and how to send data to an endpoint, this post will discuss how to process data that has been collected, enabling data scientists to work with the data. The coming blog post on model production will discuss how to deploy models on this data platform. Typically, the destination for a data pipeline is a data lake, such as Hadoop or parquet files on S3, or a relational database, such as Redshift. There's a number of other useful properties that a data pipeline should have, but this is a good starting point for a startup. As you start to build additional components that depend on your data pipeline, you'll want to set up tooling for fault tolerance and automating tasks.
With the Internet of Things (IoT), vehicles are evolving from self-contained commodities focused on transportation to sophisticated, Internet-connected endpoints often capable of two-way communication. The new data streams generated by modern connected vehicles drive innovative business models such as usage-based insurance, enable new in-vehicle experiences and build the foundation for advances such as autonomous driving and vehicle-to-vehicle (V2V) communication. Through all this, we here at Google Cloud are excited to help make this world a reality. We recently published a solution guide that describes how various Google Cloud Platform (GCP) services fit into the picture. Vehicles can produce upwards of 560 GB data per vehicle, per day.
Google announced that it is adding a fully managed IoT PaaS to its cloud platform. Google Cloud IoT Core, the latest addition to Google Cloud Platform, brings device management, machine-to-machine communication, security, and analytics to connected devices. For hyper-scale cloud providers, IoT platform service is a logical extension of existing building block services. Based on the underlying compute, storage, networking, security, and database infrastructure, an IoT PaaS is a verticalized solution for managing connected devices. Amazon, IBM, and Microsoft have been offering IoT PaaS to customers since 2015.
In the right architecture, machine-learning functionality takes data analytics to the next level of value. Editor's note: This guest post (translated from Italian and originally published in late 2016) by Lorenzo Ridi, of Google Cloud Platform partner Noovle of Italy, describes a POC for building an end-to-end analytic pipeline on GCP that includes machine-learning functionality. "Black Friday" is traditionally the biggest shopping day of the year in the United States. Black Friday can be a great opportunity to promote products, raise brand awareness and kick-off the holiday shopping season with a bang. During that period, whatever the type of retail involved, it's also becoming increasingly important to monitor and respond to consumer sentiment and feedback across social media channels.
Continuing Datamation's series on big data, Internet of Things (IoT) and artificial intelligence offerings from major cloud providers, it's time to switch gears from Microsoft Azure to Google Cloud Platform. And given the vast amounts of data that powers the search giant's services, it's only fitting to start with big data and analytics.