Using Distributed Machine Learning to Model Big Data Efficiently

May-4-2020, 05:34:03 GMT–#artificialintelligence

To use spark, we can either run it on an AWS EMR cluster, or if you just want to try it out and play with it, you can also run it on your local Jupiter notebook. There have been many great articles on how to set up your notebook on AWS EMR to use PySpark such as this one. EMR cluster configuration will also largely affect your runtime, which I will mention in the last part. For preprocessing the data, I will be using the Spark RDD manipulation to perform exploratory data analysis and visualization. The rest of the Spark preprocessing code and Plotly visualization code can be found on the Github repo, but here are the graphs out of our initial exploratory analysis.

dataframe, machine learning, model big data efficiently, (9 more...)

#artificialintelligence

May-4-2020, 05:34:03 GMT

News Web Page

Add feedback

Country:
- North America > United States > California > San Francisco County > San Francisco (0.05)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (1.00)
  - Data Science > Data Mining
    - Big Data (0.41)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found