It's a bit of an adage in the software world that when a product gets to its third version, it really hits its stride. First versions are usually what we now call minimally-viable product (MVP) releases; 2.0 releases typically add enough functionality to address some of the more egregious v1 pain points. But the 3.0 goods often tend to fit and finish, and often bring one or two important new feature sets. Such is the case with version 3.0 of Hortonworks Data Platform (HDP), being announced this morning at Hortonwork's DataWorks Summit in San Jose, CA. HDP 3.0 is itself based on version 3.1 of Apache Hadoop, which does indeed include important new areas of functionality.
In some ways Hortonworks is old fashioned in that it still clings to the stretch goal of managing half of the world's data in an era where cloud object stores and bespoke analytic services are adding more alternatives to the mix. Hortonworks' aspirational goal may not be realistic, but never mind, there are bigger fish to fry. The underlying message from this year's North American DataWorks Summit and analyst briefings is that the company is competing and facing the challenges of navigating a multipolar cloud world. My big on data bro Andrew Brust reported the headlines coming out earlier in the week: Hortonworks is releasing the 3.0 version of its data platform that, confusingly, is based on Hadoop 3.1. As we reported back at the start of the year, the 3.x generation of Apache Hadoop will mark a watershed with containerization and storage.
For data platform providers, Amazon is the ultimate frenemy. If you're trying to have a major cloud market presence, the Amazon cloud is almost impossible to avoid. So it's not surprising that Hadoop providers are increasingly making friendly with Amazon AWS - and Microsoft Azure. For Hortonworks, roughly a quarter of its customers are deploying in the cloud for some or all of their workloads. Until now, its primary cloud presence has been as the Hadoop engine of Azure's HDInsight big data service.
It would be pure understatement to say that the world has changed since Hadoop debuted just over a decade ago. Rewind the tape to 5 - 10 years ago, and if you wanted to work with big data, Hadoop was pretty much the only platform game in town. Open source software was the icing on the cake of cheap compute and storage infrastructure that made processing and storing petabytes of data thinkable. Since then, storage and compute have continued to get cheaper. But so has bandwidth, as 10 GbE connections have supplanted the 1 GbE connections that were the norm a decade ago.
With the announcement this week of its cloud-based DataPlane Service, Hortonworks Inc. is now firmly seated on the simplicity bandwagon. The enterprise-scale offering is designed to provide an easier way for organizations to govern and analyze data, no matter where it may reside. "The goal is to keep making it simpler and easier for the customer to get to the cloud, bring machine learning and data science models to the data, and make it easy for the consumption of the next generation of applications," said Rob Bearden (pictured, left), chief executive officer of Hortonworks. Bearden visited theCUBE, SiliconANGLE's mobile livestreaming studio, and spoke with co-hosts John Furrier (@furrier) and Peter Burris (@plburris) during the BigData NYC conference in New York City. He was joined by Rob Thomas (pictured, right), general manager of IBM Analytics at IBM Corp., and they discussed how the new service solves customer pain points, enterprise interest in multicloud tools and a future focus on governance.