sparklyr/sparklyr
You can connect to both local instances of Spark as well as remote Spark clusters. Here we'll connect to a local instance of Spark via the spark_connect function: The returned Spark connection (sc) provides a remote dplyr data source to the Spark cluster. For more information on connecting to remote Spark clusters see the Deployment section of the sparklyr website. We can now use all of the available dplyr verbs against the tables within the cluster. We'll start by copying some datasets from R into the Spark cluster (note that you may need to install the nycflights13 and Lahman packages in order to execute this code): To start with here's a simple filtering example: Introduction to dplyr provides additional dplyr examples you can try.
Apr-17-2020, 14:20:54 GMT