Flink components
Sources, sinks and custom transformations are based on Flink API. In order to implement any of those you need to provide:
- a Flink
DataStreamSource
in case of sources and aDataStream
intoDataStreamSink
transformation in case of sinks - a Nussknacker specification
Sources
Standard implementation
The recommended way to implement a source is through StandardFlinkSource
interface. Your source only has to implement a sourceStream
method that provides DataStreamSource
based on StreamExecutionEnvironment.
This approach provides a standard transformation of the input value into a Nussknacker Context
and allows the implementation to customize these steps:
- Timestamp watermark handler so that events are correctly processed downstream, for example to avoid (or force!) dropping late events by aggregates. Read more about notion of time and watermarks
- Context initializer
to emit more variables than
#input
. For example built-in Kafka sources emit#inputMeta
variable with Kafka record metadata like: partition, topic, offset, etc. The other example could be a file source that emits current line number as a new variable along with the content (as#input
variable)
Generic implementation
Nussknacker also provides a more generic interface for implementing sources - FlinkSource.
Instead of providing a Flink DataStreamSource
, you can provide an arbitrary DataStream[Context]
directly. However, you
have to remember to assign timestamps, watermarks, and initialize the context.