Data Science for Startups: Data Pipelines – Towards Data Science


You can find links to all of the posts in the introduction. Building data pipelines is a core component of data science at a startup. In order to build data products, you need to be able to collect data points from millions of users and process the results in near real-time. While my previous blog post discussed what type of data to collect and how to send data to an endpoint, this post will discuss how to process data that has been collected, enabling data scientists to work with the data. The coming blog post on model production will discuss how to deploy models on this data platform. Typically, the destination for a data pipeline is a data lake, such as Hadoop or parquet files on S3, or a relational database, such as Redshift. There's a number of other useful properties that a data pipeline should have, but this is a good starting point for a startup. As you start to build additional components that depend on your data pipeline, you'll want to set up tooling for fault tolerance and automating tasks.

Real-time streaming predictions using Google Cloud Dataflow and Google Cloud Machine Learning


Real-time streaming predictions using Google Cloud Dataflow and Google Cloud Machine Learning Google Cloud Dataflow is probably already embedded somewhere in your daily life, and enables companies to process huge amounts of data in real-time. But imagine that you could combine this - in real-time as well - with the prediction power of neural networks. This is exactly what we will talk about in our latest blogpost! It all started with some fiddling around with Apache Beam, an incubating Apache project that provides a programming model that handles both batch and stream processing jobs. We wanted to test the streaming capabilities running a pipeline on Google Cloud Dataflow, a Google managed service to run such pipelines.

Google Cloud: Big Data, IoT and AI Offerings - Datamation


Continuing Datamation's series on big data, Internet of Things (IoT) and artificial intelligence offerings from major cloud providers, it's time to switch gears from Microsoft Azure to Google Cloud Platform. And given the vast amounts of data that powers the search giant's services, it's only fitting to start with big data and analytics.

Real-time forecasts in the cloud: from market feed capture to ML predictions Google Cloud Big Data and Machine Learning Blog Google Cloud Platform


If you're in the financial services industry or have an interest in predicting market movements with machine learning, you may be eager to learn how to move your trading signal and forecast generation code into the cloud. You can easily scale up your computational loads, distribute data processing pipelines to run in parallel on multiple machines, speed up the time required to run complex analytics, eliminate the need for management of data storage, and ultimately eliminate the need for multiple data centers. In this post, we'll show how to build a data processing pipeline that starts with a market data feed as the input and uses machine learning to generate real-time forecasts as the output, with all application components running natively in Google Cloud. In the sections below, you'll learn how to build a complete end-to-end application that subscribes to the Thomson Reuters FX (foreign exchange) data feed published on a Cloud Pub/Sub topic, incrementally trains a TensorFlow neural network model, generates real-time forecasts of FX rates, and saves the forecasts into BigQuery for subsequent analysis. First, we'll focus on Cloud Pub/Sub as the connector used to link multiple application components.

Build a social media dashboard using machine learning and BI services Amazon Web Services


In this blog post we'll show you how you can use Amazon Translate, Amazon Comprehend, Amazon Kinesis, Amazon Athena, and Amazon QuickSight to build a natural-language-processing (NLP)-powered social media dashboard for tweets. These conversations are a low-cost way to acquire leads, improve website traffic, develop customer relationships, and improve customer service. In this blog post, we'll build a serverless data processing and machine learning (ML) pipeline that provides a multi-lingual social media dashboard of tweets within Amazon QuickSight. We'll leverage API-driven ML services that allow developers to easily add intelligence to any application, such as computer vision, speech, language analysis, and chatbot functionality simply by calling a highly available, scalable, and secure endpoint. These building blocks will be put together with very little code, by leveraging serverless offerings within AWS.