Using R for Scalable Data Science: Single Machines to Hadoop Spark Clusters

Nov-14-2017, 23:00:53 GMT–VideoLectures.NET

In this tutorial, we will demonstrate how to create scalable, end-to-end data analysis processes in R on single machines as well as in-database in SQL Server and on Hadoop clusters running Spark. We will provide hands-on exercises as well as code in a public GitHub repository for attendees to adopt in their data science practice. In particular, the attendees will see how to build, persist, and consume machine learning models using distributed machine learning functions in R. R is one of the most used languages in the data science, statistical and machine learning (ML) community. Although open-source R (CRAN library) now has in excess of 10,000 packages and functions for statics and ML, when it comes to scalable analysis using R, or deployment of trained models into production, many data scientists are blocked or hindered by (a) its limitations of available functions to handle large datasets efficiently, and (b) knowledge about the appropriate computing environments to scale R scripts from desktop analysis to elastic and distributed cloud services. In this tutorial, we will discuss how to create end-to-end data science solutions that utilize distributed compute resources.

artificial intelligence, data mining, machine learning, (9 more...)

VideoLectures.NET

Nov-14-2017, 23:00:53 GMT

News Web Page

Add feedback

Genre:
- Instructional Material > Training Manual (0.40)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (1.00)
  - Data Science > Data Mining
    - Big Data (0.64)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found