By default, Debezium delivers every data change event that it receives to the Kafka broker. However, in many cases, you might be interested in only a subset of the events emitted by the producer. To enable you to process only the records that are relevant to you, Debezium provides the filter single message transform (SMT).
While it is possible to use Java to create a custom SMT to encode filtering logic, using a custom-coded SMT has its drawbacks. For example:
It is necessary to compile the transformation up front and deploy it to Kafka Connect.
Every change needs code recompilation and redeployment, leading to inflexible operations.
The filter SMT supports scripting languages that integrate with JSR 223 (Scripting for the Java™ Platform).
For security reasons, the filter SMT is not included with the Debezium connector archives.
Instead, it is provided in a separate artifact,
To use the content-based routing SMT with a Debezium connector plug-in, you must explicitly add the SMT artifact to your Kafka Connect environment. IMPORTANT: After the filter SMT is present in a Kafka Connect instance, any user who is allowed to add a connector to the instance can run scripting expressions. To ensure that scripting expressions can be run only by authorized users, be sure to secure the Kafka Connect instance and its configuration interface before you add the filter SMT.
Download the scripting SMT archive
Extract the contents of the archive into the Debezium plug-in directories of your Kafka Connect environment.
Obtain a JSR-223 script engine implementation and add its contents to the Debezium plug-in directories of your Kafka Connect environment.
Restart your Kafka Connect process to pick up the new JAR files.
The Groovy language needs the following libraries on the classpath:
You configure the filter transformation in the Debezium connector’s Kafka Connect configuration. In the configuration, you specify the events that you are interested in by defining filter conditions that are based on business rules. As the filter SMT processes the event stream, it evaluates each event against the configured filter conditions. Only events that meet the criteria of the filter conditions are passed to the broker.
To configure a Debezium connector to filter change event records, configure the
Filter SMT in the Kafka Connect configuration for the Debezium connector.
Configuration of the filter SMT requires you to specify a regular expression that defines the filtering criteria.
For example, you might add the following configuration in your connector configuration.
transforms.filter.condition=value.op == 'u' && value.before.id == 2
The preceding example specifies the use of the
Groovy expression language.
The regular expression
value.op == 'u' && value.before.id == 2 removes all messages, except those that represent update (
u) records with
id values that are equal to
The preceding example shows a simple SMT configuration that is designed to process only DML events, which contain an
Other types of messages that a connector might emit (heartbeat messages, tombstone messages, or metadata messages about schema changes and transactions) do not contain this field.
To avoid processing failures, you can define an SMT predicate statement that selectively applies the transformation to specific events only.
Debezium binds certain variables into the evaluation context for the filter SMT. When you create expressions to specify filter conditions, you can use the variables that Debezium binds into the evaluation context. By binding variables, Debezium enables the SMT to look up and interpret their values as it evaluates the conditions in an expression.
The following table lists the variables that Debezium binds into the evaluation context for the filter SMT:
A key of the message.
A value of the message.
Schema of the message key.
Schema of the message value.
Name of the target topic.
A Java map of message headers. The key field is the header name.
An expression can invoke arbitrary methods on its variables.
Expressions should resolve to a Boolean value that determines how the SMT dispositions the message.
When the filter condition in an expression evaluates to
true, the message is retained.
When the filter condition evaluates to
false, the message is removed.
Expressions should not result in any side-effects. That is, they should not modify any variables that they pass.
In addition to the change event messages that a Debezium connector emits when a database change occurs, the connector also emits other types of messages, including heartbeat messages, and metadata messages about schema changes and transactions. Because the structure of these other messages differs from the structure of the change event messages that the SMT is designed to process, it’s best to configure the connector to selectively apply the SMT, so that it processes only the intended data change messages. You can use one of the following methods to configure the connector to apply the SMT selectively:
The way that you express filtering conditions depends on the scripting language that you use.
For example, as shown in the basic configuration example, when you use
Groovy as the expression language,
the following expression removes all messages, except for update records that have
id values set to
value.op == 'u' && value.before.id == 2
Other languages use different methods to express the same condition.
The Debezium MongoDB connector emits the
You could also take the approach of using a JSON parser within an expression to generate separate output documents for each array item.
Struct#get() method to specify the filtering condition, as in the following example:
value.get('op') == 'u' && value.get('before').get('id') == 2
value.op == 'u' && value.before.id == 2
The following table lists the configuration options that you can use with the filter SMT.
An optional regular expression that evaluates the name of the destination topic for an event to determine whether to apply filtering logic.
If the name of the destination topic matches the value in
The language in which the expression is written. Must begin with
The expression to be evaluated for every message. Must evaluate to a Boolean value where a result of
Specifies how the transformation handles