Announcing SparkR: R on Apache Spark
I am excited to announce that the upcoming Apache Spark 1.4 release will include SparkR, an R package that allows data scientists to analyze large datasets and interactively run jobs on them from the R shell. R is a popular statistical programming language with a number of extensions that support data processing and machine learning tasks. However, interactive data analysis in R is usually limited as the runtime is single-threaded and can only process data sets that fit in a single machine's memory. SparkR, an R package initially developed at the AMPLab, provides an R frontend to Apache Spark and using Spark's distributed computation engine allows us to run large scale data analysis from the R shell. The SparkR project was initially started in the AMPLab as an effort to explore different techniques to integrate the usability of R with the scalability of Spark.
Jun-28-2016, 02:31:10 GMT
- Technology: