Scalable Machine Learning with Spark
Since the early 2000s, the amount of data collected has increased enormously due to the advent of internet giants such as Google, Netflix, Youtube, Amazon, Facebook, etc. Near to 2010, another "data wave" had come about when mobile phones became hugely popular. In 2020s, we anticipate another exponential rise in data when IoT devices become all-pervasive. Given this backdrop, building scalable systems becomes a sine qua non for machine learning solutions. Pre-2005, parallel processing libraries like MPI and PVM were popular for compute heavy tasks, based on which TensorFlow was designed later. Hence, the design was aimed to reduce data redundancy, by dividing larger tables into smaller tables, and link them using relationships (Normalization).
Jun-25-2021, 06:40:20 GMT