Stop using Spark for ML!
Spark is great if you have a big volume of data that you want to process. Spark and Pyspark (the Python API for interacting with Spark) are key tools on a data engineer's toolbelt. "No matter how big your data grows, you will still be able to process it." Although it's valid for modern companies that build "classic" data pipelines using Spark end-to-end to combine, clean, transform and aggregate their data to output a dataset. The above argument does not always hold for data scientists and ML engineers building data pipelines that output a machine learning model.
Oct-4-2021, 20:02:03 GMT
- Technology: