A new year, a new preview release, in true Debezium fashion. The team is pleased to announce the first installment of the Debezium 2.6 release stream, Debezium 2.6.0.Alpha1. Let’s take a moment and dive into these new features, understand how to use these to improve your change data capture experience…
The team aims to avoid any potential breaking changes between minor releases; however, such changes are sometimes inevitable.
The MongoDB connector no longer supports the
replica_setmode (DBZ-7260). This has been a feature that has been deprecated for several versions and there has been ongoing work over Debezium 2.x to achieve this goal. If you are using the
replica_setmode, you will need to make adjustments when using Debezium 2.6+.
- Re-select Columns Post Processor
The re-select columns post processor used the key based on the
message.key.columnswhen building the query. This is not correct for most tables with primary keys. The default behavior has changed and the table primary key is used by default. A new configuration option was introduced to allow the user to choose between whether to use the primary key or the generated key,
Improvements and changes
New Matching Collections API added
One of the team’s ongoing tasks include the migration of Debezium UI’s backend into the main Debezium repository. One of the unique benefits with doing this is we can identify where there is code overlap between a connector’s runtime and the UI, and develop interface contracts to expose this shared data.
Thanks to a community contribution for DBZ-7167, the
RelationalBaseSourceConnector contract has been adjusted and a new method introduced to return a list of table names that match the connector’s specific configuration. Any connector that implements this abstract base class will need to implement this new method.
CloudEvents schema name customization
When using schema registry, event schemas need to be registered with a name so that they can be looked up upon later inquiries by pipelines. So when pairing CloudEvents formatted messages with schema registry, the same applies and in Debezium 2.6, you can explicitly control how the name is registered.
By default, the schema for a CloudEvent message will be automatically generated by the converter. However, if the auto generated schema names are not sufficient, you can adjust the configuration by specifying
dataSchemaName, which can be set either to
generate (the default behavior) or
header to pull the schema name directly from the specified event header field.
Oracle Infinispan cache improvements
The Debezium Oracle connector maintains a buffer of all in-flight transactions, and this buffer can be allocated off-heap using Infinispan. Sometimes, the user configuration specifies that if an in-flight transaction lasts longer than the specified number of milliseconds, the transaction can be abandoned or discarded by the buffer. This means that the transaction will be forgotten and not emitted by the connector.
In order to improve metrics integration with frameworks like Grafana and Prometheus, a new JMX metric,
AbandonedTransactionCount, was added to track the number of transactions that are abandoned by the connector during it’s runtime.
NEW_ROW_AND_OLD_VALUES value capture type
Google Spanner’s value capture type is responsible for controlling how the change stream represents the change data in the event stream and are configured when constructing the change stream.
Spanner introduced a new value capture mode called
NEW_ROW_AND_OLD_VALUES, which is responsible for capturing all values of tracked columns, both modified and unmodified, whenever any column changes. This new mode is an improvement over
NEW_ROW because it also includes the capture of old values, making it align with what you typically observe with other Debezium connectors.
Altogether, 25 issues were fixed in this release:
Empty object sent to GCP Pub/Sub after DELETE event DBZ-7098
Notifications are Missing the ID field in log channel DBZ-7249
Debezium-ddl-parser crashes on parsing MySQL DDL statement (sub-query with UNION) DBZ-7259
Oracle DDL parsing error in PARTITION REFERENCE DBZ-7266
Enhance Oracle’s CREATE TABLE for Multiple Table Specifications DBZ-7286
Add service loader manifests for all Connect plugins DBZ-7298
PostgreSQL ad-hoc blocking snapshots fail when snapshot mode is "never" DBZ-7311
Ad-hoc blocking snapshot dies with "invalid snapshot identifier" immediately after connector creation DBZ-7312
Specifying a table include list with spaces between elements cause LogMiner queries to miss matches DBZ-7315
Debezium heartbeat.action.query does not start before writing to WAL: part 2 DBZ-7316
Update Groovy version to 4.x DBZ-7340
errors.max.retries is not used to stop retrying DBZ-7342
Upgrade Antora to 3.1.7 DBZ-7344
Oracle connector is occasionally unable to find SCN DBZ-7345
Initial snapshot notifications should use full identifier. DBZ-7347
Upgrade Outbox Extension to Quarkus 3.6.5 DBZ-7352
MySqlJdbcSinkDataTypeConverterIT#testBooleanDataTypeMapping fails DBZ-7355
A big thank you to all the contributors from the community who worked on this release: Anisha Mohanty, Artem Shubovych, Bob Roldan, Chris Cranford, Ilyas Ahsan, Indra Shukla, Jakub Cechacek, James Johnston, Jiri Kulhanek, Jiri Pechanec, Mario Fiore Vitale, Mickael Maison, Ondrej Babec, Peter Hamer, Richard Harrington, Robert Roldan, Roman Kudryashov, Shuran Zhang, Vincenzo Santonastaso, Vojtech Juranek, and حمود سمبول!
Outlook & What’s next?
The Debezium 2.6 release cycle is one of our most ambitious initiatives with lots of new features and changes. You can find more about what the team is working on specifically for 2.6 and the road to Debezium 3.0 in our road map. If you have any suggestions or ideas, please feel free to get in touch with us on our mailing list or in our Zulip chat.
As the team continues springing into action with Debezium 2.6, we also intend to continue to bug fix and address any regressions that are reported to last quarter’s Debezium 2.5 release. Debezium 2.5 is now the project’s stable release, and we encourage everyone to upgrade and get the latest and greatest features. In fact, you can expect the next maintenance release of Debezium, 2.5.1.Final to be released later this week :).
Until next time, happy streaming!
Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.
We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.