Collaborating Authors

Real-time streaming predictions using Google Cloud Dataflow and Google Cloud Machine Learning


Real-time streaming predictions using Google Cloud Dataflow and Google Cloud Machine Learning Google Cloud Dataflow is probably already embedded somewhere in your daily life, and enables companies to process huge amounts of data in real-time. But imagine that you could combine this - in real-time as well - with the prediction power of neural networks. This is exactly what we will talk about in our latest blogpost! It all started with some fiddling around with Apache Beam, an incubating Apache project that provides a programming model that handles both batch and stream processing jobs. We wanted to test the streaming capabilities running a pipeline on Google Cloud Dataflow, a Google managed service to run such pipelines.

Data Science for Startups: Data Pipelines – Towards Data Science


You can find links to all of the posts in the introduction. Building data pipelines is a core component of data science at a startup. In order to build data products, you need to be able to collect data points from millions of users and process the results in near real-time. While my previous blog post discussed what type of data to collect and how to send data to an endpoint, this post will discuss how to process data that has been collected, enabling data scientists to work with the data. The coming blog post on model production will discuss how to deploy models on this data platform. Typically, the destination for a data pipeline is a data lake, such as Hadoop or parquet files on S3, or a relational database, such as Redshift. There's a number of other useful properties that a data pipeline should have, but this is a good starting point for a startup. As you start to build additional components that depend on your data pipeline, you'll want to set up tooling for fault tolerance and automating tasks.

Google Cloud: Big Data, IoT and AI Offerings - Datamation


Continuing Datamation's series on big data, Internet of Things (IoT) and artificial intelligence offerings from major cloud providers, it's time to switch gears from Microsoft Azure to Google Cloud Platform. And given the vast amounts of data that powers the search giant's services, it's only fitting to start with big data and analytics.

Anomaly detection using streaming analytics & AI


An organization's ability to quickly detect and respond to anomalies is critical to success in a digitally transforming culture. Google Cloud customers can strengthen this ability by using rich artificial intelligence and machine learning (AI/ML) capabilities in conjunction with an enterprise-class streaming analytics platform. We refer to this combination of fast data and advanced analytics as real-time AI. There are many applications for real-time AI across businesses, including anomaly detection, video analysis, and forecasting. In this post, we walk through a real-time AI pattern for detecting anomalies in log files.



Recently I completed the Data Engineering on Google Cloud Platform Specialization (link here) through Coursera, here is my review. Only problem was a couple of issues in the final labs of the course. You can take each module out of order or complete sequentially. Its up to you, I'd recommend to keep it sequential at least roughly. I went from 1 to 3 then went back to 2, 4 and then 5. The courses are hosted by Valliappa Lakshmanan from Google.