Microsoft has overhauled its cloud-hosted Azure HDInsight Hadoop big data mill with extra security in the shape of enhanced authentication and identity management features plus a claimed 25 times performance boost in crunching big data queries. Azure HDInsight is a service that lets users deploy and manage Apache Hadoop clusters on Microsoft's Azure cloud, and has been developed in partnership with Hadoop specialist Hortonworks using the latter's Hortonworks Data Platform. It was also updated with support for Apache Spark just a few months back, adding support for in-memory processing to speed analytics jobs. Much of the underlying framework of Azure HDInsight is thus open source software, which Redmond is very much in favour of these days. In fact, the firm claims it has played an important part in making the Apache Hive data warehouse tool run faster, and this where significant performance gains have come, thanks to something called Long Lived and Process (LLAP) functionality.
Household names such as Adobe, Jet, ASOS, Schneider Electric, and Milliman are amongst hundreds of enterprises that are powering their Big Data Analytics using Azure HDInsight. Azure HDInsight launched nearly six years ago and has since become the best place to run Apache Hadoop and Spark analytics on Azure. We will monitor the cluster and all the services, detect and repair common issues and respond to issues 24/7. Your big data applications can run more reliably as your HDInsight service monitors the health and automatically recovers from failures. Isolate your HDInsight cluster within VNETs and take advantage of transparent data encryption.
When Microsoft started out dipping its toes into the Hadoop waters, it worked with Hortonworks to port Hadoop to Windows and run it in the Azure cloud. But running Hortonworks Data Platform (HDP) for Windows meant HDInsight (as Hadoop on Azure was eventually branded) was always a step behind the more mainstream Linux distributions, and constantly playing catch-up. When Microsoft decided to offer HDInsight clusters running on Linux, everything changed. Support from across the industry materialized and the newest Hadoop features were added to the service in much faster timeframes. Still, HDInsight has been due for a polishing, and today Microsoft is announcing just that.
Last week, Microsoft held its annual Connect() event in New York City, at an event space right at the mouth of the Holland Tunnel. Connect() tends to be focused on Visual Studio and the application development stack. But just as the Holland Tunnel joins a hip part of Manhattan to Jersey City, NJ, Connect() tied together the dev stack announcements with a bunch of announcements around the Microsoft Data Platform. Microsoft had two huge announcements around SQL Server, arguably the Data Platform's component tied most closely to the developer world. But it also had announcements in the worlds of Big Data and analytics, specifically around Azure Data Lake; R Server, HDInsight and Apache Kafka.