Is SQL enough to democratise stream processing?

I am back from Current 2022 (previously Kafka Summit) conference, probably the biggest conference about (data) streaming. There is one strong message coming from all talks and meet-ups: processing data in motion rather than at rest drastically changes the way we interact with data and as the result allows us to do things that were not possible before. 

I strongly recommend watching the opening keynote. It is very inspiring, especially the first half (~20 minutes). The very idea of streaming being part of the enterprise nervous system is very powerful. The talk goes much, much further than just a recap of what streaming is - if you are wondering whether streaming is for you - this talk may be good food for thought.


Still, not enough attention is given to unlocking the data in streams to non-IT users. How can data streaming become part of the enterprise nervous system, if acting on the data in streams is the domain of savvy developers only? Does the metaphor of the nervous system go well with development cycles measured in sprint durations (weeks?) rather than hours? 

Streaming data to the people!

Domain experts, business analysts, etc, want to build algorithms that turn event data into actions. They have an understanding of data and the idea of how the algorithm might look. Often, a lot of trial and error is needed to get the algorithm right - a fraud detection algorithm might be a good example. Domain experts are not developers, they will not develop the code required. How do people go about this? The established approach is to assemble a team of developers, delve into Kafka, Flink, Spark APIs and produce a lot of Java, Python, and Whatever code. This takes time and is not cheap. The turnaround time may be unacceptable for domain experts who want to try several hypotheses per day. Even worse, this may turn very repetitive and boring for the developers. Teams may get frustrated and unhappy. Clearly, we need another approach. 


For me, there is one excellent analogy that tells me what the “new approach” should look like. Roughly 40 years ago two breakthrough products arrived on the market - VisiCalc from Apple and Lotus 1-2-3 from Lotus Software. Both these products are referred to as killer apps. According to Wikipedia, a killer app is a computer program or software that is so necessary or desirable that it proves the core value of some larger technology, such as computer hardware, a video game console, software, a programming language, a software platform, or an operating system. Both VisiCalc and Lotus 1-2-3 were so popular and desirable that they largely contributed to the sales of Apple and IBM personal computers. It is noteworthy that both VisiCalc and Lotus 1-2-3 were spreadsheet software - finally, non-programmers could crunch data without the need to know how to write code for PC or Apple. This is a key point - people wanted to play with numbers on personal computers and they did not shy away from the approach (more about it later) envisaged by Apple and Lotus Software. 

Will Confluent's Stream Designer democratize stream data processing?

Surprisingly, the streaming era Lotus 1-2-3 runner-up products (with Nussknacker as the exception) were non-existent at the conference. Confluent made the democratisation of stream processing a topic of their keynote presentation where they presented Stream Designer.

In his speech, Chad Verbowski, senior vice president of engineering at Confluent noted that democratisation is one of the two top things requested by Confluent clients. OK, so what is Stream Designer? According to Confluent, it is a “fast-track building pipelines powered by Apache Kafka® using a graphical canvas that’s extensible with SQL”.

Let’s get rid of the marketing buzzwords - Stream Designer is just a visual SQL overlay adapted to streaming - nothing more. That is disappointingly little and this is not about the democratisation of streaming. Data pipelines are about data integration - this will remain the domain of data engineers. I doubt that they will want to use a graphical tool for this. 


Characteristics of streaming killer app


So what should the Lotus 1-2-3 for the streaming era look like? What is required to make it a killer app?

Clearly, it should be low-code - people who want to process stream data in one way or another should not be bothered with the technical details of Kafka, Flink, and Spark. In this context, my colleagues from the Nussknacker team came up with an interesting white paper on which business applications are a good fit for low-code and what are the attributes of a good low-code tool.  

In the context of tools which already tried to enter streaming using a low-code avenue, two additional attributes come to my mind.

First is the tool's comprehensiveness (or lack of it). Dealing with streaming data is not just about running streaming SQL queries - you need much more if what you want to build is not just a simple streaming ETL pipeline. You want to be able to call REST APIs, perform DB lookups, and run ML models. With these three functions available you can reach any piece of data on the Earth and perform any transformation required. Without these, I would argue that the number of use cases where such a tool would fit is very limited. 

Secondly, when I think about the Lotus 1-2-3 analogy one additional and key characteristic comes to my mind. Part of the spreadsheets' success is the fact that although being low-code, they allow coding sophisticated data processing functionality. The programming language used by spreadsheets is data domain-oriented and does not pretend not to be a programming language.

In other words, spreadsheets are not ashamed that you need to enter formulas, need to understand data types, understand syntax rules, etc. Millions of people accept it - they know that if you want to crunch data, if you want to do something meaningful with the numbers you have, you need to be able to write some sort of code. It has to be code without boilerplate though - if you want to do something simple, the code also needs to be simple. Think about how the “Hello world” program looks in Excel - you just enter “Hello world” in the spreadsheet cell. Nothing more.

Low-code in data streaming domain

In summary, I think that the successful streaming low-code tool will use a streaming domain programming language.

So the question is: are low code, comprehensiveness, and smartly selected domain language enough? Are these all ingredients required for the streaming era killer app? Let me know your thoughts - just reach me at zml(at)