Collaborating Authors

Microsoft Now Developing Its Own Hadoop


Hadoop might be dead, but that's not stopping public cloud providers from using it. The latest to make a move is Microsoft Azure, which in July announced that it would begin developing its own distribution under its HDInsight brand. Microsoft, of course, has been providing Hadoop software on its Azure cloud for many years. It was an early partner of Hortonworks, and basically had an OEM version of the Hortonworks Data Platform (HDP) for the cloud that it called HDInsight. But Hortonworks merged with Cloudera in early 2019, and the HDP product line is no longer being developed, although it is still being supported by Cloudera, along with its legacy Hadoop distribution, Cloudera Distribution including Hadoop (CDH), until at least 2022.

Top 8 reasons to choose Azure HDInsight


Household names such as Adobe, Jet, ASOS, Schneider Electric, and Milliman are amongst hundreds of enterprises that are powering their Big Data Analytics using Azure HDInsight. Azure HDInsight launched nearly six years ago and has since become the best place to run Apache Hadoop and Spark analytics on Azure. We will monitor the cluster and all the services, detect and repair common issues and respond to issues 24/7. Your big data applications can run more reliably as your HDInsight service monitors the health and automatically recovers from failures. Isolate your HDInsight cluster within VNETs and take advantage of transparent data encryption.

Spark with HDInsight - Enterprise Ready Machine Learning and Interactive Data Analysis at Scale - Silicon Valley, CA


In particular, it is particularly amenable to machine learning and interactive data workloads, and can provide an order of magnitude greater performance than traditional Hadoop data processing tools. In this course, we will provide a deep-dive into Spark as a framework, understand it's design, how to optimally utilize it's design, and how to develop effective machine learning applications with Spark on HDInsight. The course covers the fundamentals of Spark, it's core APIs and design, relational data processing with Spark SQL, the fundamentals of Spark job execution, performance tuning, tracking and debugging. Users will get hands-on experience with processing streaming data with Spark streaming, training machine learning algorithms with Spark ML and R Server on Spark, as well as HDInsight configuration and platform specific considerations such as remote developing and access with Livy and IntelliJ, secure Spark, multi-user notebooks with Zeppelin, and virtual networking with other HDInsight clusters.