Beginning in 2016, Microsoft rolled out a preview of Microsoft R Server (MRS) for Azure HDInsight clusters. Recent blog posts (by Max Kaznady and David Smith) have highlighted how to use and tune this service for large scale machine learning tasks. In this post, we push the envelope and show how to build an end-to-end fully operationalized analytics pipeline using Azure Data Factory (ADF) and MRS with HDInsight (specifically Apache Spark). By integrating Azure Data Factory with Microsoft R Server and Spark, we show how to configure a scalable training and testing pipeline that operates on large volumes of data.
Sep-10-2017, 01:00:07 GMT