Streaming
This page covers topics specific to the Streaming processing mode. Please read the common introduction before proceeding. The components which implement aggregates in the time windows (Flink engine only) are covered here.
In the Streaming processing mode the scenario processes events. They are read from Kafka topics and processed by the engine of choice: Flink or Lite. At the end of the scenario, events which represent results or decisions are written to Kafka topic(s).
Prerequisites
To fully understand how Nussknacker works with Kafka topics, it's best to read the following first:
If you want to use Flink engine, this is also recommended:
Concepts
Kafka topics are native streaming data input to Nussknacker and the native output where results of Nussknacker scenarios processing are placed. In Nussknacker terminology input topics are called sources, output topics are called sinks. This section discusses Kafka based source and sink components.
It is not uncommon that programs that write events to Kafka topics (“producers”) and programs that read events from Kafka topics (“consumers”) are implemented by different people. If consumers are to understand what producers put into the topic, they need to agree between themselves the data schema - the format and data types of the data transported over Kafka topics. This “contract” is kept in what is called a Schema Registry. Nussknacker is able to read it and use to determine what data is in the event and help with field names and data type related validation of SpEL expressions. You can find some more details on schemas here; sample schemas can be found in the Quickstart.
Notion of time | Flink engine only
Notion of passing time is very important in dealing with real time events processing. Please see following excellent references to learn about basic concepts:
For Flink engine Flink documentation applies. Certain Nussknacker components make assumptions and have predefined settings, so that the end users don't have to configure all by themselves.
Sources and Sinks - Kafka
In general following rules apply:
- We use event time in scenarios to handle notion of passing time
- Kafka record timestamps are used to assign event time to Flink events
- Kafka records produced by Nussknacker sinks have timestamp of event (in the sense of event time) that generated them
- We use bound of order watermark generator, with configurable amount of lateness (see kafka.kafkaEspProperties.defaultMaxOutOfOrdernessMillis property in Configuration for details).
Aggregations, window processing
If a new event is triggered by e.g. tumbling time window, its timestamp is equal to the time of the timer that generated it, not system time of the moment when it happened. See Aggregates in Time Windows for more details.
Notion of time | Lite engine only
Lite engine is stateless, so many concepts important for windows or aggregations do not apply, but following rules apply for Kafka sources and sinks:
- Kafka record timestamps are used to determine time of the event
- Kafka records produced by Nussknacker sinks have timestamp of event that generated them