Skip to main content
Version: 1.18

Data Typing and Schemas Handling

Overview

Nussknacker as a platform integrates diverse data sources, e.g. kafka topic or http request, and also allows to enrich data using e.g. OpenAPI or databases. These integrations can return several types of data like JSON, Binary, and DB data. In each case format of these data is described in a different way:

  • Request-response inputs and outputs are described by JSON Schema, stored in Nussknacker scenario's properties
  • Source Kafka with JSON data is described by JSON Schema, stored in the schema registry
  • Source Kafka with binary data is described by Avro schema, stored in the schema registry
  • OpenAPI enricher uses JSON data described by OpenAPI interface definition
  • Database data are described by JDBC metadata, that contain column information

Typing architecture

To provide consistent and proper support for these formats Nussknacker converts meta-information about data to its own Typing Information, which is used on the Designer's part to hint and validate the data. Each part of the diagram is statically validated and typed on an ongoing basis.

Avro schema

We support Avro schema in version: 1.11.0. Avro is available only on Streaming. You need Schema Registry if you want to use Avro Schema.

Source conversion mapping

Primitive types

Avro typeJava typeComment
nullnull
stringString
booleanBoolean
intInteger32 bit
longLong64 bit
floatFloatsingle precision
doubleDoubledouble precision
bytesByteBuffer

Logical types

Conversion at source to the specific type means that behind the scene Nussknacker converts primitive type to logical type - Java objects, consequently, the end-user has access to methods of these objects.

Avro typeJava typeSampleComment
decimal (bytes or fixed)BigDecimal
uuid (string)UUID
date (int)LocalDate2021-05-17Timezone is not stored.
time - millisecond precision (int)LocalTime07:34:00.12345Timezone is not stored.
time - microsecond precision (long)LocalTime07:34:00.12345Timezone is not stored.
timestamp - millisecond precision (long)Instant2021-05-17T05:34:00ZTimestamp (millis since 1970-01-01) in human readable format.
timestamp - microsecond precision (long)Instant2021-05-17T05:34:00ZTimestamp (millis since 1970-01-01) in human readable format.

Complex types

Avro typeJava typeComment
arraylist
mapmapKey - value map, where key is always represented by String.
recordrecord
enumsorg.apache.avro.generic.GenericData.EnumSymbol
fixedorg.apache.avro.generic.GenericData.Fixed
unionAny of the above typesIt can be any of the defined type in union.

Sink validation & encoding

Java typeAvro typeComment
nullnull
Stringstring
Booleanboolean
Integerint
Longlong
Floatfloat
Doubledouble
ByteBufferbytes
listarray
mapmap
maprecord
org.apache.avro.generic.GenericRecordrecord
org.apache.avro.generic.GenericData.EnumSymbolenums
StringenumsOn the Designer we allow to pass Typing Information String, but we can't verify whether value is a valid Enum's symbol.
org.apache.avro.generic.GenericData.Fixedfixed
ByteBufferfixedOn the Designer we allow to pass Typing Information ByteBuffer, but we can't verify whether value is a valid Fixed element.
StringfixedOn the Designer we allow to pass Typing Information String, but we can't verify whether value is a valid Fixed element.
BigDecimaldecimal (bytes or fixed)
ByteBufferdecimal (bytes or fixed)
UUIDuuid (string)
Stringuuid (string)On the Designer we allow to pass Typing Information String, but we can't verify whether value is a valid UUID.
LocalDatedate (int)
Integerdate (int)
LocalTimetime - millisecond precision (int)
Integertime - millisecond precision (int)
LocalTimetime - microsecond precision (long)
Longtime - microsecond precision (long)
Instanttimestamp - millisecond precision (long)
Longtimestamp - millisecond precision (long)
Instanttimestamp - microsecond precision (long)
Longtimestamp - microsecond precision (long)
Any matching type from the list of types in the union schemaunionRead more about validation modes.

If at runtime value cannot be converted to an appropriate logic schema (e.g. "notAUUID" cannot be converted to proper UUID), then an error will be reported.

JSON Schema

We support JSON Schema in version: Draft 7 without:

JSON Schema is available on Streaming and Request-Response. To integrate with JSON on streaming we use Schema Registry. On the other hand, we have Request-Response where schemas are stored in scenario properties.

Source conversion mapping

JSON SchemaJava typeComment
nullnull
stringStringUTF-8
booleanBoolean
integerInteger/Long/BigIntegerThere will be chosen narrowest type depending upon minimum maximum value defined in the json schema. In the case when no min/max boundaries are available it will map to Long by default
numberBigDecimal
enumString
arraylist

String Format

We support the following JSON string format keywords.

JSON SchemaJava typeSampleComment
date-timeZonedDateTime2021-05-17T07:34:00+01:00Must carry zone information
dateLocalDate2021-05-17Timezone is not stored
timeLocalTime07:34:00.12345+01:00Must carry zone information

Objects

object configurationJava typeComment
object with propertiesmapMap[String, _]
object with properties and enabled additionalPropertiesmapAdditional properties are available at runtime. Similar to Map[String, _].
object without properties and additionalProperties: truemapMap[String, Unknown]
object without properties and additionalProperties: {"type": "integer"}mapMap[String, Integer]

We support additionalProperties, but additional fields won't be available in the hints on the Designer. To get an additional field you have to do #inpute.get("additional-field"), but remember that result of this expression is depends on additionalProperties type configuration and can be Unknown.

Schema Composition

typeJava typeComment
oneOfAny from the available list of the schemasWe treat it just like a union.
anyOfAny from the available list of the schemasWe treat it just like a union.

Sink validation & encoding

Java typeJSON SchemaComment
nullnull
Stringstring
Booleanboolean
Integerinteger
LongintegerOnly if the minimum/maximum values in json schema for this type are not defined or these values do not fit in java's Integer type range.
BigIntegerintegerOnly if the minimum/maximum values in json schema for this type do not fit in java's Long type range.
Floatnumber
Doublenumber
BigDecimalnumber
listarray
mapobject
StringenumOn the Designer we allow to pass Typed[String], but we can't verify whether value is a valid Enum's symbol.
Any matching type from the list of types in the union schemaschema composition: oneOf, anyOfRead more about validation modes.

Properties not supported on the Designer

JSON SchemaPropertiesComment
stringlength, regular expressions, format
numericmultiples
arrayadditional items, tuple validation, contains, min/max, length, uniqueness
objectunevaluatedProperties, extending closed, property names, size
compositionallOf, notRead more about validation modes.

These properties will be not validated by the Designer, because on during scenario authoring time we work only on Typing Information not on real value. Validation will be still done at runtime.

Pattern properties

Sources

Object (also nested) in source schema will be represented during scenario authoring as:

  • Map - when there is no property defined in properties field
    • if only additionalProperties are defined then map values will be typed to according to schema in additionalProperties field
    • if both additionalProperties and patternProperties are defined then values will be typed as Union with all possible types from additionalProperties and patternProperties
  • Record otherwise
    • all non explicit properties can then be accessed using record["patternOrAdditionalPropertyName"] syntax but for now only if pl.touk.nussknacker.engine.api.process.ExpressionConfig.dynamicPropertyAccessAllowed is enabled ( only possible in deprecated instalations with own ProcessConfigCreator)
Sinks

Pattern properties add additional requirements during scenario authoring for types that should be encoded into JSON Schema object type:

  • Strict mode
    • only records types are allowed (no map types) and only if their fields' types are valid according to pattern properties restrictions (in addition to properties and additionalProperties)
  • Lax mode
    • records are allowed under the same conditions as in strict mode but additionally Unknown type is allowed as a value's type
    • map types are allowed if their value's type matches any of property, patternProperty or additionalProperties schema or is an Unknown type

Validation and encoding

As we can see above on the diagram, finally preparing data (e.g. Kafka sink / response sink) is divided into two parts:

  • during validation on the Designer, the Typing Information is compared against the sink schema
  • encoding data at runtime based on the data type, the data is converted to the internal representation expected by the sink

Sometimes situations can happen that are not so obvious to handle, e.g. how we should pass and validate Unknown and Union types.

type Unknown

A situation when Nussknacker can not detect the exact type of data.

type Union

A situation when the data can be any of several representation.

In the case of Union and Unknown types, the actual data type is known only at runtime - only then the decision how to encode can be taken. Sometimes, it may happen that encoding will not be possible, due to the mismatch of the actual data type and data type expected in the sink and the runtime error will be reported. The number of runtime encoding errors can be reduced by applying strict schema validation rules during scenario authoring. This is the place where validation mode comes in.

Validation modes

Validation modes determines how Nussknacker handles validation Typing Information against the sink schema during the scenario authoring. You can set validation mode by setting Value validation mode param on sinks where raw editor is enabled.

Strict modeLax modeComment
allow passing additional fieldsnoyesThis option works only at Avro Schema. JSON Schema manages additional fields itself explicitly by schema property: additionalProperties.
require providing optional fieldsyesno
allow passing UnknownnoyesWhen data at runtime will not match against the sink schema, then error be reported during encoding.
passing UnionTyping Information union has to
be the same as union schema of the sink
Any of element from Typing Information
union should match
When data at runtime will not match against the sink schema, then error be reported during encoding.

General intuition is that in strict mode a scenario that was successfully validated should not produce any type connect encoding errors during runtime (it can still produce errors e.g. for range validation in JSON Schema or valid enum entry validation in Avro). On the other hand, in lax mode NU allows to deploy scenario if there is any chance it can encode data properly, but responsibility for passing valid type to sink (e.g. in Unknown type) is on end-user side.

We leave to the user the decision of which validation mode to choose. But be aware of it, and remember it only impacts how we validate data during scenario authoring, and some errors can still occur during encoding at runtime.