As we are just one step away from the Debezium 2.5 final release, I am pleased to announce that Debezium 2.5.0.CR1 is now available. This release includes a number of improvements like AWS SQS sink for Debezium Server, INSERT/DELETE semantics for incremental snapshot watermarking, ReselectColumnsPostProcessor, uniform Oracle LOB behavior.

Additionally, this release includes a variety of bug fixes and several breaking changes.

Let’s take a closer look at all these changes and improvements included in Debezium 2.5.0.CR1; as always, you can find the complete list of changes for this release in the release notes. Please remember to take special note to any breaking changes that could affect your upgrade path.

Breaking changes

While we strive to avoid breaking changes, sometimes those changes are inevitable to evolve the right direction. This release includes several breaking changes.

Schema name for Cloud Event headers

The schema name prefix and letter casing for Cloud Event headers was not consistent with payload name. The schema name was aligned so both headers and payload share th same namespace and follow the same rules for letter casing (DBZ-7216).

MySQL BIT default length

MySQL BIT datatype did not have an implicit length if any was not set. This is incorrect as the default length if none is provided is 1 (DBZ-7230).

New features and improvements

Debezium 2.5 also introduces more improvements and features, lets take a look at each individually.

Re-select columns

In some cases, because of the way that certain source databases function, when a Debezium connector emits a change event, the event might exclude values for specific column types. For example, values for TOAST columns in PostgreSQL, LOB columns in Oracle, or Extended String columns in Oracle Exadata, might all be excluded.

Debezium 2.5 introduces the ReselectColumnsPostProcessor providing a way to re-select one or more columns from a database table and fetch the current state. You can configure the post processor to re-select the following column types:

  • null columns.

  • columns that contain the unavailable.value.placeholder sentinel value.

Configuring a PostProcessor is similar to configuring a CustomConverter or Transformation, except that it works on the mutable payload’s Struct rather than the SourceRecord.

Debezium Server - StreamNameMapper for Apache Kafka sink

The Kafka sink behaviour can now be modified by a custom logic providing alternative implementations for specific functionalities. When the alternative implementations are not available then the default ones are used.

For more details, please see the Apache Kafka Injection points.

INSERT/DELETE semantics for incremental snapshot watermarking

The property incremental.snapshot.watermarking.strategy has been introduced to let users choose the watermarking strategy to use during an incremental snapshot.

The insert_insert (old behavior) approach lets Debezium creating two entries in the signaling data collection for each chunk during the snapshot to signal the opening of the snapshot window and another to mark its closure.

On the other hand, with the insert_delete option, a single entry is written in the signaling data collection for each chunk at the beginning of the window. After completion, this entry is removed, and no corresponding entry is added to signify the closure of the snapshot window. This method aids in more efficient management of the signaling data collection.

For more details, please see the Connector properties section of the connector of your interest.

Debezium Server - AWS SQS sink

Amazon Simple Queue Service (Amazon SQS) is a distributed message queuing service. It supports programmatic sending of messages via web service applications as a way to communicate over the Internet. SQS is intended to provide a highly scalable hosted message queue that resolves issues arising from the common producer–consumer problem or connectivity between producer and consumer.

Debezium 2.5 offers the possibility to send events to Amazon SQS.

Oracle LOB behavior

Debezium 2.5 aligns LOB behavior in snapshot and streaming. When lob.enabled is set to false, the unavailable value placeholder will be explicitly included during snapshot to match the behavior of streaming.

Other fixes

In addition, there were quite a number of stability and bug fixes that made it into this release. These include the following:

  • Oracle abandoned transaction implementation bug causes OoM DBZ-7236

  • Add Grammar Oracle Truncate Cluster DBZ-7242

  • Length value is not removed when changing a column’s type DBZ-7251

  • MongoDB table/collection snapshot notification contain incorrect offsets DBZ-7252

  • Broken support for multi-namespace watching DBZ-7254

  • Add tracing logs to track execution time for Debezium JDBC connector DBZ-7217

  • Validate & clarify multiple archive log destination requirements for Oracle DBZ-7218

  • Upgrade logback to 1.2.13 DBZ-7232

Altogether, 16 issues were fixed for this release. A big thank you to all the contributors from the community who worked on this release: Bob Roldan, Chris Cranford, Gunnar Morling, Harvey Yue, Ilyas Ahsan, Indra Shukla, Jakub Cechacek, Jiabao Sun, Jiri Kulhanek, Jiri Pechanec, Jordan Pittier, Mario Fiore Vitale, Nils Hartmann, Roman Kudryashov, Sebastiaan Knijnenburg, Tudor Plugaru, V K, and Zhongqiang Gong!

What’s next?

We have just over a week before the team takes a break for the holidays, and so we are preparing for Debezium 2.5 final release. We intend to release it the week before the holiday break.

The team has also finalized the roadmap for 2024, here’s a sneak peek at some highlights (and remember, this is just the tip of the iceberg!):

  • Asynchronous-based processing in Debezium Engine

  • Official MariaDB connector

  • User-friendly offset manipulation (i.e, start at a specific position in the transaction logs)

  • Sink connector for MongoDB

For more details, please check out our road map for all upcoming details around Debezium 2.6 and beyond.

As always, please be sure to get in touch with us on the mailing list or Zulip chat if you have questions or feedback. Until next time, stay warm out there!

Fiore Mario Vitale

Mario is a Senior Software Engineer at Red Hat. He lives in Italy.

   


About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.