Spark AI Summit: Bay Area Apache Spark Meetup @ Moscone Center, SF
In this talk, Richard Garris, Principal Architect at Databricks will explain how various ML algorithms are parallelized in Apache Spark. Andrew Ng calls the algorithms the "rocket ship" and the data "the fuel that you feed machine learning" to build deep learning applications. We will start with an understanding of machine learning pipelines built using single machine algorithms including Pandas, scikit-learn, and R. Then we will discuss how Apache Spark MLlib can be used to parallelize your machine learning pipeline with Linear Regression and Random Forest. Lastly, we will discuss ways to parallelize single machine algorithms in Spark by broadcasting the data and then performing distributed feature selection, model creation or hyperparameter tuning. Bio: Richard Garris is a Principal Solutions Architect at Databricks focused on helping clients with their Advanced Analytics initiatives using Apache Spark and MLlib.
May-23-2018, 23:32:06 GMT
- Country:
- North America > United States
- Ohio (0.05)
- California > Santa Clara County
- Palo Alto (0.05)
- Europe > Russia
- Central Federal District > Moscow Oblast > Moscow (0.05)
- North America > United States
- Technology: