It's a bit of an adage in the software world that when a product gets to its third version, it really hits its stride. First versions are usually what we now call minimally-viable product (MVP) releases; 2.0 releases typically add enough functionality to address some of the more egregious v1 pain points. But the 3.0 goods often tend to fit and finish, and often bring one or two important new feature sets. Such is the case with version 3.0 of Hortonworks Data Platform (HDP), being announced this morning at Hortonwork's DataWorks Summit in San Jose, CA. HDP 3.0 is itself based on version 3.1 of Apache Hadoop, which does indeed include important new areas of functionality.
In some ways Hortonworks is old fashioned in that it still clings to the stretch goal of managing half of the world's data in an era where cloud object stores and bespoke analytic services are adding more alternatives to the mix. Hortonworks' aspirational goal may not be realistic, but never mind, there are bigger fish to fry. The underlying message from this year's North American DataWorks Summit and analyst briefings is that the company is competing and facing the challenges of navigating a multipolar cloud world. My big on data bro Andrew Brust reported the headlines coming out earlier in the week: Hortonworks is releasing the 3.0 version of its data platform that, confusingly, is based on Hadoop 3.1. As we reported back at the start of the year, the 3.x generation of Apache Hadoop will mark a watershed with containerization and storage.
For data platform providers, Amazon is the ultimate frenemy. If you're trying to have a major cloud market presence, the Amazon cloud is almost impossible to avoid. So it's not surprising that Hadoop providers are increasingly making friendly with Amazon AWS - and Microsoft Azure. For Hortonworks, roughly a quarter of its customers are deploying in the cloud for some or all of their workloads. Until now, its primary cloud presence has been as the Hadoop engine of Azure's HDInsight big data service.
It would be pure understatement to say that the world has changed since Hadoop debuted just over a decade ago. Rewind the tape to 5 - 10 years ago, and if you wanted to work with big data, Hadoop was pretty much the only platform game in town. Open source software was the icing on the cake of cheap compute and storage infrastructure that made processing and storing petabytes of data thinkable. Since then, storage and compute have continued to get cheaper. But so has bandwidth, as 10 GbE connections have supplanted the 1 GbE connections that were the norm a decade ago.
Cloudera and Hortonworks jointly announced that they have entered into a definitive agreement under which the companies will combine in an all-stock merger of equals. Also: Cloudera, Hortonworks merge in deal valued at $5.2 billion Under the terms of the agreement, Cloudera stockholders will own approximately 60 percent of the equity of the combined company, and Hortonworks stockholders will own 40 percent. While both have been fierce competitors, this merger (if approved) will raise the bar on innovation in the big data space, especially in supporting an end-to-end big data strategy in a hybrid and multicloud environment. This post originally appeared here. What a hybrid cloud is in the'multi-cloud era,' and why you may already have one Now that the services used by an enterprise and provided to its customers may be hosted on servers in the public cloud or on-premises, maybe "hybrid cloud" isn't an architecture any more.