AITopics | sparklyr

Collaborating Authors

sparklyr

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Tidy Time Series Forecasting in R with Spark

#artificialintelligenceOct-20-2021, 15:24:54 GMT

I'm SUPER EXCITED to show fellow time-series enthusiasts a new way that we can scale time series analysis using an amazing technology called Spark! Without Spark, large-scale forecasting projects of 10,000 time series can take days to run because of long-running for-loops and the need to test many models on each time series. Spark has been widely accepted as a "big data" solution, and we'll use it to scale-out (distribute) our time series analysis to Spark Clusters, and run our analysis in parallel. Spark is an amazing technology for processing large-scale data science workloads. Modeltime is a state-of-the-art forecasting library that I personally developed for "Tidy Forecasting" in R. Modeltime now integrates a Spark Backend with capability of forecasting 10,000 time series using distributed Spark Clusters.

forecasting, modeltime, time sery, (10 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.58)

Add feedback

Looking to the future for R in Azure SQL and SQL Server - Microsoft SQL Server Blog

#artificialintelligenceJul-1-2021, 03:02:21 GMT

Data science, machine learning, and analytics have re-defined how we look at the world. The R community plays a vital role in that transformation and the R language continues to be the de-facto choice for statistical computing, data analysis, and many machine learning scenarios. The importance of R was first recognized by the SQL Server team back in 2016 with the launch of SQL ML Services and R Server. Over the years we have added Python to SQL ML Services in 2017 and Java support through our language extensions in 2019. Earlier this year we also announced the general availability of SQL ML Services into Azure SQL Managed Instance.

azure sql, server, sql server, (9 more...)

#artificialintelligence

Technology:

Information Technology > Databases (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Add feedback

sparklyr/sparklyr

#artificialintelligenceApr-17-2020, 14:20:54 GMT

You can connect to both local instances of Spark as well as remote Spark clusters. Here we'll connect to a local instance of Spark via the spark_connect function: The returned Spark connection (sc) provides a remote dplyr data source to the Spark cluster. For more information on connecting to remote Spark clusters see the Deployment section of the sparklyr website. We can now use all of the available dplyr verbs against the tables within the cluster. We'll start by copying some datasets from R into the Spark cluster (note that you may need to install the nycflights13 and Lahman packages in order to execute this code): To start with here's a simple filtering example: Introduction to dplyr provides additional dplyr examples you can try.

interface, spark cluster, sparklyr, (14 more...)

#artificialintelligence

Genre: Press Release (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Comparison of ML Classifiers Using Sparklyr

#artificialintelligenceJan-29-2017, 00:15:23 GMT

You can use sparklyr to run a variety of classifiers in Apache Spark. For the Titanic data, the best performing models were tree based models. Gradient boosted trees was one of the best models, but also had a much longer average run time than the other models. Random forests and decision trees both had good performance and fast run times. While these models were run on a tiny data set in a local spark cluster, these methods will scale for analysis on data in a distributed Apache Spark cluster.

decision tree learning, machine learning, ml classifier, (3 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.76)

Add feedback

sparklyr -- R interface for Apache Spark

#artificialintelligenceNov-9-2016, 13:35:15 GMT

H2O Sparkling Water supports a wide array of algorithms, and as illustrated above it's easy to chain these functions together with dplyr pipelines. To learn more see the H2O Sparkling Water section.

artificial intelligence, machine learning, partition, (6 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

sparklyr -- R interface for Apache Spark

#artificialintelligenceOct-16-2016, 13:15:41 GMT

H2O Sparkling Water supports a wide array of algorithms, and as illustrated above it's easy to chain these functions together with dplyr pipelines. To learn more see the H2O Sparkling Water section.

artificial intelligence, machine learning, partition, (6 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

R Addict Blog

#artificialintelligenceAug-26-2016, 06:55:32 GMT

Machine and statistical learning wizards are becoming more eager to perform analysis with Spark ML library if this is only possible. It's trendy, posh, spicy and gives the feeling of doing state of the art machine learning and being up to date with the newest computational trends. It is even more sexy and powerful when computations can be performed on the extraordinarily enormous computation cluster - let's say 100 machines on YARN hadoop cluster makes you the real data cruncher! In this post I present sparklyr package (by RStudio), the connector that will transform you from a regular R user, to the supa! Moreover, I present how I have extended the interface to K-means procedure, so that now it is also possible to compute cost for that model, which might be beneficial in determining the number of clusters in segmentation problems.

algorithm, artificial intelligence, machine learning, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)

Add feedback