Complex Event Processing with Apache Flink — Visual CEP with Nussknacker

Complex Event Processing is one of Apache Flink's most powerful but underused capabilities. This article shows how Nussknacker wraps Flink's MATCH_RECOGNIZE in a visual editor, with a complete flight pattern detection example on live OpenSky data.

What is Complex Event Processing in Apache Flink?

Complex Event Processing (CEP) detects meaningful sequences of events in real-time data streams. Unlike regular stream processing — which filters, aggregates, or enriches individual events — CEP looks for patterns across multiple events over time.

Did an aircraft descend below a certain altitude, continue descending for several reports, then touch the ground? That's a landing. Did it descend, then suddenly climb again without touching down? That's a go-around. These are event sequences — and detecting them in real time is exactly what CEP is built for.

Apache Flink implements CEP through the MATCH_RECOGNIZE clause in Flink SQL. It allows you to define pattern variables (states), specify the sequence they should appear in, set conditions for each state, and extract results when a full match is detected. It works like regular expressions, but applied to event streams instead of text.

As Kai Waehner recently wrote, CEP is one of Flink's most powerful but underused capabilities. The reason is simple: it's hard to use.

The problem: powerful but painful

Writing a MATCH_RECOGNIZE query means dealing with SQL that can quickly grow to dozens or hundreds of lines. Here's what a single go-around detection pattern looks like in raw Flink SQL — 46 lines, and this is just one of three patterns in our example scenario:

The first 20 lines are pure boilerplate — mapping input fields. The actual pattern logic starts halfway through. And you still need to understand greedy vs reluctant quantifiers, handle NULL guards for functions like LAST(), manage append-only vs changelog mode, and debug with no visibility into what's happening inside the match.

There is no built-in way to test a pattern against sample data. There is no way to preview intermediate states. If something doesn't match, you're left staring at 46 lines of SQL trying to figure out which condition failed.

We've written extensively about why Streaming SQL is not the right tool for building complex event-driven applications. SQL was designed for tables, not for stateful, time-sensitive logic with branching, scoped variables, and complex conditions. And yet, MATCH_RECOGNIZE is genuinely useful — too useful to ignore.

What is Nussknacker?

Nussknacker is an open-source, low-code streaming IDE for Apache Kafka and Apache Flink. It lets domain experts and developers build event-driven applications visually — without writing code.

Instead of programming Flink jobs in Java or Scala, you design scenario graphs in the browser. Each node in the graph performs an operation: reading from Kafka, filtering, enriching with external APIs or databases, branching, aggregating over time windows, calling ML models, or writing results to Kafka, databases, or HTTP endpoints.

Nussknacker provides contextual autocompletion, live data preview at every node, built-in testing with assertions, version history, and one-click deployment. It runs on Apache Flink for stateful stream processing or on a lightweight Kubernetes-native engine for simpler workloads.

Best of both Flink APIs

Apache Flink offers two programming models. The DataStream API gives you full control over state, timers, side outputs, and branching — but it requires Java or Scala and deep Flink expertise. The Table API and Flink SQL give you declarative power for windowing, joins, and pattern matching — but they break down when logic gets complex, and they offer limited observability and testing.

Nussknacker is built on Flink's DataStream API. The visual scenario graph gives you DataStream-level capabilities: branching, scoped variables, stateful aggregations, HTTP enrichment, ML model inference, and custom logic — all without coding.

On top of that, Nussknacker implements Flink's Table API, so you can use Flink SQL where it genuinely excels — like MATCH_RECOGNIZE for Complex Event Processing, or ROW_NUMBER for deduplication — as nodes within the same scenario graph. The SQL executes inside Flink, but it's wrapped in a visual editor with templates, live previews, and testing.

You get the expressiveness of the DataStream API and the declarative power of the Table API, together, in one visual environment.

How Nussknacker makes Flink CEP accessible

Nussknacker provides a visual editor for Flink SQL MATCH_RECOGNIZE queries. Instead of writing raw SQL, you:

  1. Select a query type from a dropdown — Match Recognize, Deduplication, or Generic (raw SQL). Each type provides a tailored form.
  2. Pick an input variable — Nussknacker automatically maps all fields from your Kafka source, generating the SELECT boilerplate you'd otherwise write by hand.
  3. Define the pattern visually — colored pills represent pattern variables (A → B → C). Click a pill to set its quantifier (one or more, two or more, reluctant variants) and description.
  4. Set conditions per state — each pattern variable has a card with condition rows. Simple conditions use dropdowns (field, operator, value). Complex conditions switch to raw expression mode for things like LAST(B.baro_altitude, 1) IS NULL OR B.baro_altitude < LAST(B.baro_altitude, 1).
  5. Configure match options — rows per match, after match strategy, and an optional WITHIN time constraint for efficient state management.
  6. Define output measures — a dialog maps pattern variables to output fields using FIRST(), LAST(), or direct references. These become fields on the output variable, ready to use downstream.
  7. Preview generated SQL — a collapsible panel shows the full SQL query that Nussknacker generates from your visual configuration.

Detecting flight patterns on live OpenSky data

To demonstrate, we built a complete flight event processing pipeline using real-time data from the OpenSky Network — a crowd-sourced flight tracking database that provides aircraft positions, altitudes, velocities, and more via a public REST API.

The scenario architecture

The scenario reads from a Kafka topic containing live flight data (ingested from OpenSky in a separate scenario). It then:

  1. Deduplicates — OpenSky can report the same aircraft state multiple times. A Flink SQL deduplication node partitions by icao24 (aircraft identifier) and time_position, keeping only the first event per key.
  2. Splits — a Split node fans out the deduplicated stream to three parallel branches.
  3. Detects patterns — each branch runs a MATCH_RECOGNIZE query for a different flight event.

Full scenario graph with live event counts — aiven source to deduplication to Split to three detection branches to three Kafka sinks

Deduplication

OpenSky reports the same aircraft position multiple times. The deduplication node uses Flink SQL ROW_NUMBER to keep only the first event per aircraft per timestamp — filtering 476 incoming events down to 372 unique records.

Deduplication node with live incoming and outgoing records

Landing detection

The simplest pattern: aircraft in the air (on_ground = FALSE), followed by aircraft on the ground (on_ground = TRUE), within a 3-minute window.

Output measures: aircraft ICAO address, callsign, airborne time, touchdown time, and touchdown coordinates.

Landing detection node with live data showing a detected landing

Go-around detection

A more complex four-state pattern: aircraft descending below 1500m (A), continuing descent below 800m with decreasing altitude (B, one or more events), then climbing (C, two or more events, reluctant quantifier), confirmed by reaching altitude above the climb phase (D).

The reluctant quantifier {2,}? on C is important — without it, Flink's greedy matching would consume the terminal event D, preventing the match from completing. This is one of the subtleties that the visual editor makes explicit: you select "two or more, reluctant" from a dropdown instead of remembering to append ? to the quantifier.

Go-around pattern visual editor showing A to B to C to D pills and DEFINE cards with conditions

Rapid descent detection

Detects altitude drops faster than 15 meters per second from above 3000 meters: initial rapid descent (A), continued descent with decreasing altitude (B, two or more events), followed by recovery where vertical rate eases above -15 m/s (C).

The MEASURES clause computes altitude_lost as the difference between start and end altitude, giving immediate insight into the severity of the descent.

Generic SQL mode

For patterns that don't fit the visual templates, you can switch to Generic query type and write any Flink SQL directly. Nussknacker still provides the input variable mapping and field preview — you just write the query yourself.

Generic query type with raw SQL editor showing the go-around MATCH_RECOGNIZE query

Writing to Kafka

Each detection branch ends with a Kafka sink writing to its own Avro-encoded topic: landing_events, go_around_events, and rapid_descent_events. The data mapper auto-maps output variable fields to the Avro schema.

Testing CEP patterns in Nussknacker

One of the biggest advantages over raw Flink SQL is built-in testing. Each pattern has its own test case with carefully crafted event sequences — a landing sequence with decreasing altitude ending on the ground, a go-around with descent followed by climb, a rapid descent with vertical rates exceeding -15 m/s.

You paste test events directly in the testing tab, add assertions (e.g., "records.size equals 1" for a single expected match), and run all test cases at once. Results show immediately — green checkmarks or failures with details.

Test cases dropdown showing all four test cases: deduplicate, landing pattern, go around pattern, rapid descent Full scenario graph with all tests passing — green badges on every node

This makes iterating on patterns fast. Change a condition threshold, re-run the test, see if the pattern still matches — in seconds, not the minutes it would take to redeploy a Flink job.

Deploy and observe

With tests passing, deployment is one click. The scenario goes live on Apache Flink, processing the incoming flight data stream in real time. Nussknacker shows live event counts on each node — you can see events flowing through deduplication, splitting to three branches, and matches appearing at each sink.

Opening any node while the scenario is running shows incoming and outgoing records in real time. For the landing detection node, you can see actual detected landings with aircraft callsigns, timestamps, and touchdown coordinates — live, as they happen.

Landing detection node while deployed — live incoming and outgoing records with detected landings

When to use Nussknacker for Flink CEP

Nussknacker is a good fit when you want to:

  • Detect patterns in event streams using Flink MATCH_RECOGNIZE without writing raw SQL
  • Combine CEP with other stream processing operations (enrichment, filtering, aggregation, ML inference) in a single visual pipeline
  • Test detection patterns with real event sequences before deployment
  • Let domain experts — not just Flink developers — build and iterate on detection logic
  • Deploy and monitor scenarios with one click, with full version history

Nussknacker is open source and available as a managed cloud service. The Flink SQL integration — including the visual Match Recognize editor, Deduplication template, and Generic SQL mode — runs on Nussknacker's Apache Flink engine.

Video walkthrough

Watch the full demo: building this flight event processing pipeline from scratch.

Part 1 — Ingesting live flight data from OpenSky Network into Kafka:

Part 2 — Processing flight data with CEP pattern detection:

Further reading