This article was published as a part of the Data Science Blogathon. Cloud computing is a technology that uses the computer system resources like cloud storage, computing power, and they manage data on remote servers and access them via the internet. To know more about Cloud computing. In the last 5 years, the demand for cloud computing keeps on increasing day by day. Many new cloud service providers came to the market. One of the most popular cloud services is the Google cloud platform. In this article, we are going to deep dive into the ML pipeline in GCP (Google cloud platform).
Every application generates data, but what do those data mean? This is a question all data scientists are hired to answer. There is no doubt that this information is the most precious commodity for a business. But making sense of data, creating insights and turning them into decisions, is even more important. As the data keep growing in volume, the data analytics pipelines have to be scalable to adapt the rate of change.
In my August 2020 article, "How to choose a cloud Machine Learning platform," my first guideline for choosing a platform was, "Be close to your data." Keeping the code near the data is necessary to keep the latency low, since the speed of light limits transmission speeds. After all, machine learning -- especially deep learning -- tends to go through all your data multiple times (each time through is called an epoch). I said at the time that the ideal case for very large data sets is to build the model where the data already resides, so that no mass data transmission is needed. Several databases support that to a limited extent.
One of the perks of using Google Cloud Platform (GCP) is having BigQuery, Google's cloud hosted data warehouse solution at your disposal. BigQuery gives GCP users access to the key features of Dremel, Google's very own internal data warehouse solution. Under the hood Dremel stores data in columnar format and uses a tree architecture to parallelise queries across thousands of machines, with each query scanning the entire table. So, what is so great about that? With BigQuery you can run SQL queries on a table with billions of rows and get the results in seconds!