Goto

Collaborating Authors

 sparklyr


Tidy Time Series Forecasting in R with Spark

#artificialintelligence

I'm SUPER EXCITED to show fellow time-series enthusiasts a new way that we can scale time series analysis using an amazing technology called Spark! Without Spark, large-scale forecasting projects of 10,000 time series can take days to run because of long-running for-loops and the need to test many models on each time series. Spark has been widely accepted as a "big data" solution, and we'll use it to scale-out (distribute) our time series analysis to Spark Clusters, and run our analysis in parallel. Spark is an amazing technology for processing large-scale data science workloads. Modeltime is a state-of-the-art forecasting library that I personally developed for "Tidy Forecasting" in R. Modeltime now integrates a Spark Backend with capability of forecasting 10,000 time series using distributed Spark Clusters.


Looking to the future for R in Azure SQL and SQL Server - Microsoft SQL Server Blog

#artificialintelligence

Data science, machine learning, and analytics have re-defined how we look at the world. The R community plays a vital role in that transformation and the R language continues to be the de-facto choice for statistical computing, data analysis, and many machine learning scenarios. The importance of R was first recognized by the SQL Server team back in 2016 with the launch of SQL ML Services and R Server. Over the years we have added Python to SQL ML Services in 2017 and Java support through our language extensions in 2019. Earlier this year we also announced the general availability of SQL ML Services into Azure SQL Managed Instance.


sparklyr/sparklyr

#artificialintelligence

You can connect to both local instances of Spark as well as remote Spark clusters. Here we'll connect to a local instance of Spark via the spark_connect function: The returned Spark connection (sc) provides a remote dplyr data source to the Spark cluster. For more information on connecting to remote Spark clusters see the Deployment section of the sparklyr website. We can now use all of the available dplyr verbs against the tables within the cluster. We'll start by copying some datasets from R into the Spark cluster (note that you may need to install the nycflights13 and Lahman packages in order to execute this code): To start with here's a simple filtering example: Introduction to dplyr provides additional dplyr examples you can try.


Comparison of ML Classifiers Using Sparklyr

#artificialintelligence

You can use sparklyr to run a variety of classifiers in Apache Spark. For the Titanic data, the best performing models were tree based models. Gradient boosted trees was one of the best models, but also had a much longer average run time than the other models. Random forests and decision trees both had good performance and fast run times. While these models were run on a tiny data set in a local spark cluster, these methods will scale for analysis on data in a distributed Apache Spark cluster.


sparklyr -- R interface for Apache Spark

#artificialintelligence

H2O Sparkling Water supports a wide array of algorithms, and as illustrated above it's easy to chain these functions together with dplyr pipelines. To learn more see the H2O Sparkling Water section.


sparklyr -- R interface for Apache Spark

#artificialintelligence

H2O Sparkling Water supports a wide array of algorithms, and as illustrated above it's easy to chain these functions together with dplyr pipelines. To learn more see the H2O Sparkling Water section.


R Addict Blog

#artificialintelligence

Machine and statistical learning wizards are becoming more eager to perform analysis with Spark ML library if this is only possible. It's trendy, posh, spicy and gives the feeling of doing state of the art machine learning and being up to date with the newest computational trends. It is even more sexy and powerful when computations can be performed on the extraordinarily enormous computation cluster - let's say 100 machines on YARN hadoop cluster makes you the real data cruncher! In this post I present sparklyr package (by RStudio), the connector that will transform you from a regular R user, to the supa! Moreover, I present how I have extended the interface to K-means procedure, so that now it is also possible to compute cost for that model, which might be beneficial in determining the number of clusters in segmentation problems.