I’m very happy to announce the release of Debezium 1.2.0.Alpha1!

This first drop of the 1.2 release line provides a number of useful new features:

  • Support for message transformations (SMTs) and converters in the Debezium embedded engine API

  • A new SMT for filtering out change events using scripting languages

  • Automatic reconnects for the SQL Server connector

  • A new column masking mode using consistent hash values

Overall, the community fixed not less than 41 issues for this release. Let’s take a closer look at some of them in the remainder of this post.

Embedded Engine Improvements

Debezium’s embedded engine is a very useful tool for handling change events in cases where Apache Kafka and Kafka Connect are not available. For instance it allows to use Debezium’s CDC capabilities and stream change events to alternative messaging infrastructure such as Amazon Kinesis or Google Pub/Sub.

To further improve the experience when working with this API, it supports the serialization of change events into different formats now (DBZ-1807): JSON, Avro and CloudEvents. This spares developers from having to deal with record serialization themselves. As an example, here is how to use JSON as serialization format:

Properties props = new Properties();

// don't include schema in message
props.setProperty("converter.schemas.enable", "false"); (1)
// further properties as needed...

DebeziumEngine<ChangeEvent<String>> engine = DebeziumEngine.create(Json.class) (2)
    .using(props)
    .notifying((records, committer) -> { (3)
        for (ChangeEvent<String> r : records) {
            System.out.println("Key = '" + key + "' value = '" + value + "'");
            committer.markProcessed(r);
        }
    })
    .build();
1 All the options of the underlying converter can be used
2 Json.class is a type token requesting serialization into JSON
3 records is a batch of change events, represented as JSON strings

The embedded engine now also supports the usage of Kafka Connect SMTs (DBZ-1930). They can be simply configured via the properties passed to the engine builder:

Properties props = new Properties();

props.setProperty("transforms", "router");
props.setProperty("transforms.router.type", "org.apache.kafka.connect.transforms.RegexRouter");
props.setProperty("transforms.router.regex", "(.*)");
props.setProperty("transforms.router.replacement", "trf$1");

This allows to use any existing Kafka Connect SMT, such as the ones coming with Kafka Connect itself, or Debezium’s SMTs, e.g. for topic routing, new record state extraction and outbox event routing.

These improvements lay the foundation to the upcoming stand-alone Debezium runtime, which will be based on the embedded engine and make its functionality available as a ready-to-use service.

Content-based Event Filtering

This release also adds another very versatile transformation to Debezium: the message filter SMT. Applied to the Debezium connectors on the source side of a Kafka Connect data streaming pipeline, it allows to filter out specific change events based on their field values.

E.g. you could use this to filter out any change events of a specific customer type or product category. Filters are given as script expressions, using any language compatible with the javax.scripting API (JSR 223). Note Debezium doesn’t provide any such scripting language implementation itself; instead you can choose from a wide range of available options such as Groovy, MVEL or graal.js (JavaScript via GraalVM) and add it to the Kafka Connect plug-in path yourself.

Here’s an example using Groovy:

...
transforms=filter
transforms.filter.type=io.debezium.transforms.Filter
transforms.filter.language=jsr223.groovy
transforms.filter.condition=value.after.customerType != 42
...

value is the change event’s value; you could also refer to the event’s key and even the corresponding schema objects. Groovy automatically resolves property paths such as value.after.customerType to look-ups in map-like data structures such as Kafka Connect’s Struct type. This allows for very concise filtering conditions.

Note this SMT is incubating state for now, i.e. details around its API and configuration surface may still change. Please give it a try and share your experiences.

Other Features

Besides these key features, there’s a number of other new functionalities coming with the 1.2.0.Alpha1 release:

  • New metrics NumberOfDisconnects and NumberOfPrimaryElections for the MongoDB connector (DBZ-1859)

  • Support for automatic reconnects after connection losses in the SQL Server connector (DBZ-1882)

  • New column masking mode "consistent hashing" (DBZ-1692): Debezium allows to mask specific column values, e.g. to satisfy concerns around data privacy and protection. Using the new "consistent hashing" mode it’s now possible to not only use asterisks as masking characters, but also hash values based on the masked data contents. Quoting the original issue reporter, this "will be useful for [anonymizing] data but in this case it still needs to be relatable between topics. It’s a typical requirement for warehouses where you want to anonymize sensitive data but still need to keep referential integrity of your data"

  • Allowing to link update change events in case of primary key updates (DBZ-1531): most relational Debezium connectors represent an update to the primary key of a record by a delete event using the old key and a subsequent insert event using the updated key; using the new record headers __debezium.newkey and __debezium.oldkey, it is now possible for consumers to link these change events together when working with change data from the MySQL and Postgres connectors

  • Upgrade of Debezium’s container images to Apache Kafka 2.4.1 (DBZ-1925)

Bugfixes

Also a number of bugs were fixed, e.g.:

  • High CPU usage when the Postgres connector is idle (DBZ-1960)

  • Empty wal2json empty change event could cause NPE (DBZ-1922)

  • Cassandra Connector: unable to deserialize column mutation with reversed type (DBZ-1967)

  • Outbox Quarkus Extension throws NPE in quarkus:dev mode (DBZ-1966)

  • Validation of binlog_row_image is not compatible with MySQL 5.5 (DBZ-1950)

Please refer to the release notes for the complete list of resolved issues as well as procedures for upgrading from earlier Debezium versions. We’ve also backported the critical bugfixes to the 1.1 branch and will release Debezium 1.1.1 tomorrow.

A big thank you to all the contributors from the community who worked on this release: Alexander Iskuskov, Alexander Schwartz, Bingqin Zhou, Fatih Güçlü Akkaya, Grant Cooksey, Jan-Hendrik Dolling, Luis Garcés-Erice, Nayana Hettiarachchi and René Kerner!

Gunnar Morling

Gunnar is a software engineer at Decodable and an open-source enthusiast by heart. He has been the project lead of Debezium over many years. Gunnar has created open-source projects like kcctl, JfrUnit, and MapStruct, and is the spec lead for Bean Validation 2.0 (JSR 380). He’s based in Hamburg, Germany.

   


About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.