The demand for real-time data processing is rising, and streaming vendors are proliferating and competing. Apache Kafka is a key component in many data pipeline architectures, mostly due to its ability to ingest streaming data from a variety of sources in real time. Confluent, the commercial entity behind Kafka, has the ambition to leverage this position to become a platform of choice for real-time application development in the enterprise. On the road to implementing this vision, Kafka has expanded its reach to include more than data ingestion -- most notably, processing. In this process, the overlap with other platforms is growing and Confluent seems set on adding features that will enable Kafka to stand out.
Confluent cofounders Neha Narkhede, CEO Jay Kreps and Jun Rao want to help companies use Kafka in the cloud. High-flying startup Confluent is bringing its open-source technology Apache Kafka to the cloud. In the years since its founders devised Kafka while at LinkedIn in 2010, the database streaming software has become one of tech's most popular ways to manage large amounts of data when it's needed fast. Investors have poured $80 million into the company launched by its creators, Confluent, valuing the buzzy startup at more than $530 million, according to data from PitchBook. As with any tech company built off an open-source project--just ask Docker, another high-flyer that recently brought on a third CEO--scaling a lasting and lucrative business off Kafka has trailed behind the popularity of its free version.
The day when armies of business analysts can query incoming data in real time may be drawing closer. Supporting such continuous interactive queries is a goal of KSQL, software put forward this week by the Kafka data-streaming software originators at Confluent Inc. KSQL is a SQL engine that directly handles Apache Kafka data streams. She also said KSQL is intended to broaden the use of Kafka beyond Java and Python, opening up Kafka programming to developers familiar with SQL; although, the form of SQL Confluent is using here is a dialect, one the company has developed to deal with the unique architecture of Kafka streaming. The software is appearing first as a developer preview, and it will be available under an Apache 2.0 license, according to the company. Created at LinkedIn, Kafka began life as a publish-and-subscribe messaging system that focused on handling log files as system events.
This guest post comes from Neha Narkhede, co-founder and CTO at Confluent, a startup focused on Apache Kafka and founded by its creators. Data systems in the modern world aren't islands that stand on their own; data often flows between databases, offline data stores and search systems, as well as to stream processing systems. But for a long time, data technology in companies was fairly homogeneous; data mostly resided in two popular locations: operational data stores and the data warehouse. And a substantial portion of data collection and processing that companies did ran as big batch jobs -- CSV files dumped out of databases, log files collected at the end of the day, etc. But businesses operate in real time and the software they run is catching up.
Wouldn't it be great if working with streaming data were just as simple as working with data at rest? And imagine if the two could be modeled, processed and coded against similarly; that would let organizations working with analytics broaden the scope of their work to do real-time streaming analytics too. We're not quite there yet, but Kafka Streams, a lightweight Java library that works with the Apache Kafka stream data platform, gets us closer, by empowering mainstream Java developers. And today, with the release of Confluent Data Platform 3.0, Kafka Streams has reached general availability (it had been released in preview form in Confluent Data Platform 2.0). How it works; where it's useful At the risk of oversimplifying things, Kafka Streams makes streaming data look like a conventional table, of keys and value pairs (the data structure is called a KTable).