The demand for real-time data processing is rising, and streaming vendors are proliferating and competing. Apache Kafka is a key component in many data pipeline architectures, mostly due to its ability to ingest streaming data from a variety of sources in real time. Confluent, the commercial entity behind Kafka, has the ambition to leverage this position to become a platform of choice for real-time application development in the enterprise. On the road to implementing this vision, Kafka has expanded its reach to include more than data ingestion -- most notably, processing. In this process, the overlap with other platforms is growing and Confluent seems set on adding features that will enable Kafka to stand out.
With the release of Apache Kafka 1.0 this week, an eight-year journey is finally coming to a temporary end. Temporary because the project will continue to evolve, see near-term big fixes, and long-term feature updates. But for Neha Narkhede, Chief Technology Officer of Confluent, this release is the culmination of work towards a vision she and a team of engineers first laid out in 2009. Back then, a team at LinkedIn decided it had the solution to a major data stream processing problem. Narkhede said the originators of Kafka first began their journey to building the project by sitting down and trying to understand why stream processing companies founded in the 1990's and 2000's had failed.
The Streams API of Apache Kafka is the easiest way to write mission-critical real-time applications and microservices with all the benefits of Kafka's server-side cluster technology. It allows you to build standard Java or Scala applications that are elastic, highly scalable, and fault-tolerant, and don't require a separate processing cluster technology. Applications can be deployed on containers, VMs, or bare-metal hardware, to the cloud or on-premises. The Confluent Platform manages the barrage of stream data and makes it available throughout an organization. It provides various industries, from retail, logistics and manufacturing, to financial services and online social networking, a scalable, unified, real-time data pipeline that enables applications ranging from large volume data integration to big data analysis with Hadoop to real-time stream processing.
Confluent, the vendor offering a commercial version and services for the open source Apache Kafka platform, just received $125 million in funding. This has launched Confluent into unicorn territory with a valuation of $2.5 billion. You probably know this by now, and if you've been following, you also know why and how Confluent got to this point. ZDNet has been keeping track of Kafka and Confluent's evolution, and the news were a good opportunity to catch up with Jay Kreps, Confluent CEO. Here is the lowdown on how Kafka will evolve from now on, the latest updates on the data streaming landscape, and last but not least, what this all means for the cloud and open-source software.
The ability to ingest and process large volumes of data in real time is something interesting to more and more organizations. This is an area seeing rapid growth, as the use cases can translate to direct business benefits. We have been following this space for a while now, and the release of Apache Samza 1.0 is a good opportunity to revisit it and see how this changes things, if at all. Apache Samza was developed at LinkedIn in 2013. Samza became a top-level Apache project in 2014, and now, it is used by over 3,000 applications in production at LinkedIn.