As the summer concludes for us in the north and we await the autumn colors, the team has been busy preparing for the next major release of Debezium 2.4. It’s my pleasure to announce today that we are nearly there with the release of Debezium 2.4.0.CR1.

The focus for this release is primarily on stability; however, we do have a few new last minute addititons that we should highlight, so let’s dive right in, shall we?!

Breaking changes

The community led Vitess connector was retrying only a subset of errors by default. This behavior has been changed and now only explicitly defined errors are not retried. For more details, please see DBZ-6944.

New Features

MongoDB parallel incremental snapshots

Since the introduction of incremental snapshots back in Debezium 1.x, the process to incremental snapshot existing data while concurrently capturing changes from a database transaction has been a single-threaded activity. It’s not uncommon when adding new features to focus on the basics and build upon that foundation, which is precisely what has happened with MongoDB.

In Debezium 2.4, we are taking the first steps to add parallel support to incremental snapshots with the MongoDB connector by reading multiple chunks in parallel. This should allow faster throughput at the cost of memory while the chunks are being collected, sorted, and deduplication occurs against the transaction log capture data set. Thanks to Yue Wang for starting this effort in DBZ-6518, it’s most definitely something we are looking to explore for the relational connectors in an upcoming Debezium release.

PostgreSQL 16 support

PostgreSQL announced the immediate release for PostgreSQL 16 just over a week ago, and we’re pleased to announce that Debezium 2.4 will support that release.

PostgreSQL 16 introduces logical replication from standby servers; however, this feature has not yet been tested by Debezium and will be a feature introduced in a later build of Debezium. For now, logical replication remains only supported via the primary.

Google Spanner GKE workload identity support

Google Kubernetes Engine (GKE) supports identity workloads, allowing you to use a more secure authentication mechanism than the traditional JSON-based keys. In Debezium 2.4, when no JSON key is explicitly set, the Spanner connector will now automatically default to GKE workload identity authentication. Thanks to laughingman7743 for this effort as a part of DBZ-6885.

Other Fixes

  • Ad-hoc blocking snaps trigger emits schema changes of all tables DBZ-6828

  • When the start_scn corresponding to the existence of a transaction in V$TRANSACTION is 0, log mining starts from the oldest scn when the oracle connector is started for the first time DBZ-6869

  • Ensure that the connector can handle rebalance events robustly DBZ-6870

  • OpenLogReplicator confirmation can resend or omit events on restarts DBZ-6895

  • ExtractNewRecordState’s schema cache is not updated with arrival of the ddl change event DBZ-6901

  • Misleading Debezium error message when RDI port is not specified in application.properties DBZ-6902

  • Generting protobuf files to target/generated-sources breaks build DBZ-6903

  • Clean log printout in Redis Debezium Sink DBZ-6908

  • Values being omitted from list of JSON object DBZ-6910

  • fix logger named DBZ-6935

  • MySql connector get NPE when snapshot.mode is set to never and signal data collection configured DBZ-6937

  • Sanity check / retry for redo logs does not work per Oracle RAC thread DBZ-6938

  • Drop events has wrong table changes information DBZ-6945

  • Remove spaces from Signal and Notification MBean’s ObjectName DBZ-6957

Altogether, 20 issues were fixed for this release. A big thank you to all the contributors from the community who worked on this release: Andy Pickler, Anisha Mohanty, Breno Moreira, Chris Cranford, Harvey Yue, Indra Shukla, Jakub Cechacek, Jiri Pechanec, Mario Fiore Vitale, Nancy Xu, Nir Levy, Ondrej Babec, René Kerner, Sergey Eizner, Thomas Thornton, Wu Zhenhua, Zheng Wang, laughingman7743, and tison!

Outlook and What’s next?

We’re now at the junction where we begin to set our vision on Debezium 2.5 and what lies ahead. We recently held our first Community Meeting and discussed a number of our 2.5 roadmap ideas, some of which include:

  • Parallel incremental snapshots for relational connectors.

  • Improved MongoDB support for BSONDocument exceeding 16MB.

  • Db2 support on z/OS and iSeries platforms.

  • Batch support in the JDBC sink connector.

  • Parallelization of tasks and other Debezium Engine internals.

  • Preview of MariaDB and Oracle 23 support

For more details, please check out our road map for all upcoming details around Debezium 2.5 and beyond.

Additionally, Debezium will be at Current 2023 next week. If you are attending, be sure to stop by the Ask-The-Experts session on Wednesday at 2:30PM to catch a sesssion on Debezium and Kafka. Additionally, be sure to check out the sponsored session on Wednesday at 4:30PM to find out just how easy it is to deploy data pipelines from the edge to the cloud using open-source projects such as Debezium, Strimzi, Apicurio, and Kubernetes.

As always, if you have any questions, suggestions, or feedback, please reach out to us on our mailing list or chat. We always enjoy hearing what you have to share. Until next time, be safe.

Chris Cranford

Chris is a software engineer at Red Hat. He previously was a member of the Hibernate ORM team and now works on Debezium. He lives in North Carolina just a few hours from Red Hat towers.

   


About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.