As we are just one step away from the Debezium 2.5 final release, I am pleased to announce that Debezium 2.5.0.CR1 is now available. This release includes a number of improvements like AWS SQS sink for Debezium Server, INSERT/DELETE semantics for incremental snapshot watermarking, ReselectColumnsPostProcessor, uniform Oracle LOB behavior.
Additionally, this release includes a variety of bug fixes and several breaking changes.
Let’s take a closer look at all these changes and improvements included in Debezium 2.5.0.CR1; as always, you can find the complete list of changes for this release in the release notes. Please remember to take special note to any breaking changes that could affect your upgrade path.
Breaking changes
While we strive to avoid breaking changes, sometimes those changes are inevitable to evolve the right direction. This release includes several breaking changes.
Schema name for Cloud Event headers
The schema name prefix and letter casing for Cloud Event headers was not consistent with payload name. The schema name was aligned so both headers and payload share th same namespace and follow the same rules for letter casing (DBZ-7216).
MySQL BIT
default length
MySQL BIT
datatype did not have an implicit length if any was not set. This is incorrect as the default length if none is provided is 1
(DBZ-7230).
New features and improvements
Debezium 2.5 also introduces more improvements and features, lets take a look at each individually.
Re-select columns
In some cases, because of the way that certain source databases function, when a Debezium connector emits a change event, the event might exclude values for specific column types. For example, values for TOAST
columns in PostgreSQL, LOB
columns in Oracle, or Extended String
columns in Oracle Exadata, might all be excluded.
Debezium 2.5 introduces the ReselectColumnsPostProcessor
providing a way to re-select one or more columns from a database table and fetch the current state. You can configure the post processor to re-select the following column types:
-
null columns.
-
columns that contain the
unavailable.value.placeholder
sentinel value.
Configuring a PostProcessor is similar to configuring a CustomConverter or Transformation, except that it works on the mutable payload’s Struct rather than the SourceRecord.
Debezium Server - StreamNameMapper for Apache Kafka sink
The Kafka sink behaviour can now be modified by a custom logic providing alternative implementations for specific functionalities. When the alternative implementations are not available then the default ones are used.
For more details, please see the Apache Kafka Injection points.
INSERT/DELETE semantics for incremental snapshot watermarking
The property incremental.snapshot.watermarking.strategy
has been introduced to let users choose the watermarking strategy to use during an incremental snapshot.
The insert_insert
(old behavior) approach lets Debezium creating two entries in the signaling data collection for each chunk during the snapshot to signal the opening of the snapshot window and another to mark its closure.
On the other hand, with the insert_delete
option, a single entry is written in the signaling data collection for each chunk at the beginning of the window. After completion, this entry is removed, and no corresponding entry is added to signify the closure of the snapshot window. This method aids in more efficient management of the signaling data collection.
For more details, please see the Connector properties
section of the connector of your interest.
Debezium Server - AWS SQS sink
Amazon Simple Queue Service (Amazon SQS) is a distributed message queuing service. It supports programmatic sending of messages via web service applications as a way to communicate over the Internet. SQS is intended to provide a highly scalable hosted message queue that resolves issues arising from the common producer–consumer problem or connectivity between producer and consumer.
Debezium 2.5 offers the possibility to send events to Amazon SQS.
Oracle LOB behavior
Debezium 2.5 aligns LOB behavior in snapshot and streaming. When lob.enabled
is set to false
, the unavailable value placeholder will be explicitly included during snapshot to match the behavior of streaming.
Other fixes
In addition, there were quite a number of stability and bug fixes that made it into this release. These include the following:
-
Oracle abandoned transaction implementation bug causes OoM DBZ-7236
-
Add Grammar Oracle Truncate Cluster DBZ-7242
-
Length value is not removed when changing a column’s type DBZ-7251
-
MongoDB table/collection snapshot notification contain incorrect offsets DBZ-7252
-
Broken support for multi-namespace watching DBZ-7254
-
Add tracing logs to track execution time for Debezium JDBC connector DBZ-7217
-
Validate & clarify multiple archive log destination requirements for Oracle DBZ-7218
-
Upgrade logback to 1.2.13 DBZ-7232
Altogether, 16 issues were fixed for this release. A big thank you to all the contributors from the community who worked on this release: Bob Roldan, Chris Cranford, Gunnar Morling, Harvey Yue, Ilyas Ahsan, Indra Shukla, Jakub Cechacek, Jiabao Sun, Jiri Kulhanek, Jiri Pechanec, Jordan Pittier, Mario Fiore Vitale, Nils Hartmann, Roman Kudryashov, Sebastiaan Knijnenburg, Tudor Plugaru, V K, and Zhongqiang Gong!
What’s next?
We have just over a week before the team takes a break for the holidays, and so we are preparing for Debezium 2.5 final release. We intend to release it the week before the holiday break.
The team has also finalized the roadmap for 2024, here’s a sneak peek at some highlights (and remember, this is just the tip of the iceberg!):
-
Asynchronous-based processing in Debezium Engine
-
Official MariaDB connector
-
User-friendly offset manipulation (i.e, start at a specific position in the transaction logs)
-
Sink connector for MongoDB
For more details, please check out our road map for all upcoming details around Debezium 2.6 and beyond.
As always, please be sure to get in touch with us on the mailing list or Zulip chat if you have questions or feedback. Until next time, stay warm out there!
About Debezium
Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.
Get involved
We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.