Apache Kafka, the open-source distributed messaging system, has steadily carved a foothold as the de facto real-time standard for brokering messages in scale-out environments. And if you think you have seen this opener before, it's because you have. Besides being fellow ZDNet's Tony Baer opener for his piece commenting on Kafka usage survey in July, you've probably read something along these lines elsewhere, or had that feeling yourself. Yes, Kafka is in most whiteboards, but it's mostly the whiteboards of early adopters, was the gist of Baer's analysis. With Kafka Summit kicking off today San Francisco, we took the opportunity for a chat with Jay Kreps, Kafka co-creator and Confluent CEO, on all things Kafka, as well as the broader landscape.
The demand for real-time data processing is rising, and streaming vendors are proliferating and competing. Apache Kafka is a key component in many data pipeline architectures, mostly due to its ability to ingest streaming data from a variety of sources in real time. Confluent, the commercial entity behind Kafka, has the ambition to leverage this position to become a platform of choice for real-time application development in the enterprise. On the road to implementing this vision, Kafka has expanded its reach to include more than data ingestion -- most notably, processing. In this process, the overlap with other platforms is growing and Confluent seems set on adding features that will enable Kafka to stand out.
Both approaches have their pros and cons. The blog post Machine Learning and Real-Time Analytics in Apache Kafka Applications and the Kafka Summit presentation Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFlow discuss this in detail. There are more and more applications where the analytic model is directly embedded into the event streaming application, making it robust, decoupled, and optimized for performance and latency. The model can be loaded into the application when starting it up (e.g., using the TensorFlow Java API). Model management (including versioning) depends on your build pipeline and DevOps strategy. For example, new models can be embedded into a new Kubernetes pod which simply replaces the old pod. Another commonly used option is to send newly trained models (or just the updated weights or hyperparameters) as a Kafka message to a Kafka topic.
Confluent, the vendor offering a commercial version and services for the open source Apache Kafka platform, just received $125 million in funding. This has launched Confluent into unicorn territory with a valuation of $2.5 billion. You probably know this by now, and if you've been following, you also know why and how Confluent got to this point. ZDNet has been keeping track of Kafka and Confluent's evolution, and the news were a good opportunity to catch up with Jay Kreps, Confluent CEO. Here is the lowdown on how Kafka will evolve from now on, the latest updates on the data streaming landscape, and last but not least, what this all means for the cloud and open-source software.
Confluent cofounders Neha Narkhede, CEO Jay Kreps and Jun Rao want to help companies use Kafka in the cloud. High-flying startup Confluent is bringing its open-source technology Apache Kafka to the cloud. In the years since its founders devised Kafka while at LinkedIn in 2010, the database streaming software has become one of tech's most popular ways to manage large amounts of data when it's needed fast. Investors have poured $80 million into the company launched by its creators, Confluent, valuing the buzzy startup at more than $530 million, according to data from PitchBook. As with any tech company built off an open-source project--just ask Docker, another high-flyer that recently brought on a third CEO--scaling a lasting and lucrative business off Kafka has trailed behind the popularity of its free version.