Event streams always need Flink, or …not really?

We believe there are many use cases where Nussknacker can bring significant value without operational overhead caused by Flink or other sophisticated stream processing engines

Stream processing - what can it mean? The term is quite broad. There are complex, powerful stream processing engines, some of the most known include:

  • Apache Flink
  • Hazelcast Jet
  • Materialized

They can scale up to billions of events per sec and handle terabytes of state with exactly-once guarantees. However, the power comes with a cost - they are not so easy to maintain and operate. What’s more, developing solutions (i.e. creating and modifying jobs) requires quite specific development skills (some of them are simpler to grasp for an average Joe-developer - like Kafka Streams, but that’s a different story). Of course, SQL and other DSLs are making it easier, but maintaining a production cluster is still not for the faint-hearted.

But there is another type of streaming solution - less spectacular, maybe a bit mundane - but still extremely valuable from a business POV It contains various cases where:

  • You don’t have vast amounts of stateful computations
  • Or very large amounts of data, that cannot be handled with a database

Examples include:

  • Marketing campaigns where you react to each event separately, without the need for complex event patterns
  • NextBestAction - how to recommend the most beneficial offer to the customer (remaining profitable enough…)

When we looked at some of Nussknacker deployments, we noticed that users didn’t use fancy Flink stuff - like aggregations, streaming joins and so on - at least, they didn’t use it often. The rules themselves were the most complex part, but from the technical point of view, they were just a bunch of if-then-else statements and simple enrichments.

It turns out that to handle such use cases you don’t actually need anything fancy - Kafka-based (micro)service will do. In fact, using Kafka alone can give you surprisingly much:

  • Scaling and HA - as consumers are rebalanced automatically by the cluster coordinator
  • Exactly-once delivery - using transactions (though you need to be careful here and understand various guarantees and tradeoffs)
  • Automatic backpressure handling - as consumers are pull-based by their nature

Now, you may wonder - what about the state, enrichments with exactness guarantees? Well, in many cases, especially if you have moderate traffic you can go surprisingly far with good old cache - like Redis or Ignite. In one of our deployments, we enrich Kafka streams with customers' data from Redis around 500k/s - using just one Redis server and Vertex-based API.

Of course, you probably won’t have real-time, up-to-date knowledge of customers’ history. But, let’s be honest - in many cases having access to data that is, for example, 1 hour old is more than enough. What’s more, you probably already have pipelines that feed those data to some DBs, all you need to do is to put them also to Redis or Ignite. It’s not that kappa architecture is overrated - for places where you need exact computations (e.g. some user stats that determine their level or billing) it’s probably the best choice - but there is still a lot of places where more lax requirements are perfectly fine.

To wrap up: we believe there are many use cases where Nussknacker can bring significant value without operational overhead caused by Flink or other sophisticated stream processing engines. Stay tuned, as in the next entries we’ll cover more technical details on how we want to deliver on this belief :)