In this article, perhaps the first in a mini-series, I want to explain the concepts of streams and tables in stream processing and, specifically, in Kafka. Hopefully, you will walk away with both a better theoretical understanding but also more tangible insights and ideas that will help you solve your current or next practical use case better, faster, or both. Some users have a stream processing or Kafka background, some have their roots in RDBMS like Oracle and MySQL, some have neither. One common question is, "What's the difference between streams and tables?" In this article I want to give both a short TL;DR answer but also a longer answer so that you can get a deeper understanding. Some of the explanations below will be slightly simplified because that makes them easier to understand and also easier to remember (like how Newton's simpler but less accurate gravity model is perfectly sufficient for most daily situations, saving you from having to jump straight to Einstein's model of relativity; well, fortunately, stream processing is never that complicated anyways).
In simple words, Kafka Streams is a library which you can include in your Java based applications to build stream processing applications on top of Apache Kafka. Other distributed computing platforms like Apache Spark, Apache Storm etc. are widely used in the big data stream processing world, but Kafka Streams brings some unique propositions in this area Kafka Streams provides a State Store feature using which applications can store its local processing results (the state). RocksDB is used as the default state store and it can be used in persistent or in-memory mode. In our sample application, the state which we care about is the count of occurrences of the keywords which we chose to follow -- how is it implemented? Oracle Application Container Cloud provides access to a scalable in-memory cache and it's used the custom state store in our use case It's possible to scale our stream processing service both ways (details in the documentation) i.e. elastically
There can actually be a number of steps in ESP processing such as filtering, splitting into multiple streams, creating notifications, joins with existing data, and the application of business rules or scoring algorithms, all of which happens'in memory' at the'edge' of the system before the data is passed into storage.