You can find links to all of the posts in the introduction. Building data pipelines is a core component of data science at a startup. In order to build data products, you need to be able to collect data points from millions of users and process the results in near real-time. While my previous blog post discussed what type of data to collect and how to send data to an endpoint, this post will discuss how to process data that has been collected, enabling data scientists to work with the data. The coming blog post on model production will discuss how to deploy models on this data platform. Typically, the destination for a data pipeline is a data lake, such as Hadoop or parquet files on S3, or a relational database, such as Redshift. There's a number of other useful properties that a data pipeline should have, but this is a good starting point for a startup. As you start to build additional components that depend on your data pipeline, you'll want to set up tooling for fault tolerance and automating tasks.
Continuing Datamation's series on big data, Internet of Things (IoT) and artificial intelligence offerings from major cloud providers, it's time to switch gears from Microsoft Azure to Google Cloud Platform. And given the vast amounts of data that powers the search giant's services, it's only fitting to start with big data and analytics.
In the right architecture, machine-learning functionality takes data analytics to the next level of value. Editor's note: This guest post (translated from Italian and originally published in late 2016) by Lorenzo Ridi, of Google Cloud Platform partner Noovle of Italy, describes a POC for building an end-to-end analytic pipeline on GCP that includes machine-learning functionality. "Black Friday" is traditionally the biggest shopping day of the year in the United States. Black Friday can be a great opportunity to promote products, raise brand awareness and kick-off the holiday shopping season with a bang. During that period, whatever the type of retail involved, it's also becoming increasingly important to monitor and respond to consumer sentiment and feedback across social media channels.
About this course: This 1-week, accelerated on-demand course builds upon Google Cloud Platform Big Data and Machine Learning Fundamentals. Through a combination of video lectures, demonstrations, and hands-on labs, you'll learn how to build streaming data pipelines using Google Cloud Pub/Sub and Dataflow to enable real-time decision making. You will also learn how to build dashboards to render tailored output for various stakeholder audience.
At Pivotal Data Science, our primary charter is to help our customers derive value from their data assets, be it in the reduction of cost or by increasing revenue by offering better products and services. While we are not working on customer engagements, we engage in R&D using our wide array of products. For instance, we may contribute a new module to PDLTools or MADlib - our distributed in-database machine learning libraries, we might build end-to-end demos such as these or experiment with new technology and blog about them here. Last quarter, we set out to explore data science microservices for operationalizing our models for real-time scoring. Microservices have been the most talked about topic in many Cloud conferences of late. They've gained a large fan following by application developers, solution architects, data scientists and engineers alike.