A new year, a new preview release, in true Debezium fashion. The team is pleased to announce the first installment of the Debezium 2.6 release stream, Debezium 2.6.0.Alpha1. Let’s take a moment and dive into these new features, understand how to use these to improve your change data capture experience…​

Breaking changes

The team aims to avoid any potential breaking changes between minor releases; however, such changes are sometimes inevitable.

MongoDB
  • The MongoDB connector no longer supports the replica_set mode (DBZ-7260). This has been a feature that has been deprecated for several versions and there has been ongoing work over Debezium 2.x to achieve this goal. If you are using the replica_set mode, you will need to make adjustments when using Debezium 2.6+.

Re-select Columns Post Processor
  • The re-select columns post processor used the key based on the message.key.columns when building the query. This is not correct for most tables with primary keys. The default behavior has changed and the table primary key is used by default. A new configuration option was introduced to allow the user to choose between whether to use the primary key or the generated key, reselect.use.event.key (DBZ-7358).

Improvements and changes

New Matching Collections API added

One of the team’s ongoing tasks include the migration of Debezium UI’s backend into the main Debezium repository. One of the unique benefits with doing this is we can identify where there is code overlap between a connector’s runtime and the UI, and develop interface contracts to expose this shared data.

Thanks to a community contribution for DBZ-7167, the RelationalBaseSourceConnector contract has been adjusted and a new method introduced to return a list of table names that match the connector’s specific configuration. Any connector that implements this abstract base class will need to implement this new method.

CloudEvents schema name customization

When using schema registry, event schemas need to be registered with a name so that they can be looked up upon later inquiries by pipelines. So when pairing CloudEvents formatted messages with schema registry, the same applies and in Debezium 2.6, you can explicitly control how the name is registered.

By default, the schema for a CloudEvent message will be automatically generated by the converter. However, if the auto generated schema names are not sufficient, you can adjust the configuration by specifying dataSchemaName, which can be set either to generate (the default behavior) or header to pull the schema name directly from the specified event header field.

Oracle Infinispan cache improvements

The Debezium Oracle connector maintains a buffer of all in-flight transactions, and this buffer can be allocated off-heap using Infinispan. Sometimes, the user configuration specifies that if an in-flight transaction lasts longer than the specified number of milliseconds, the transaction can be abandoned or discarded by the buffer. This means that the transaction will be forgotten and not emitted by the connector.

In order to improve metrics integration with frameworks like Grafana and Prometheus, a new JMX metric, AbandonedTransactionCount, was added to track the number of transactions that are abandoned by the connector during it’s runtime.

Supports Spanner NEW_ROW_AND_OLD_VALUES value capture type

Google Spanner’s value capture type is responsible for controlling how the change stream represents the change data in the event stream and are configured when constructing the change stream.

Spanner introduced a new value capture mode called NEW_ROW_AND_OLD_VALUES, which is responsible for capturing all values of tracked columns, both modified and unmodified, whenever any column changes. This new mode is an improvement over NEW_ROW because it also includes the capture of old values, making it align with what you typically observe with other Debezium connectors.

Other changes

Altogether, 25 issues were fixed in this release:

  • Empty object sent to GCP Pub/Sub after DELETE event DBZ-7098

  • Notifications are Missing the ID field in log channel DBZ-7249

  • Debezium-ddl-parser crashes on parsing MySQL DDL statement (sub-query with UNION) DBZ-7259

  • Oracle DDL parsing error in PARTITION REFERENCE DBZ-7266

  • Enhance Oracle’s CREATE TABLE for Multiple Table Specifications DBZ-7286

  • Add service loader manifests for all Connect plugins DBZ-7298

  • PostgreSQL ad-hoc blocking snapshots fail when snapshot mode is "never" DBZ-7311

  • Ad-hoc blocking snapshot dies with "invalid snapshot identifier" immediately after connector creation DBZ-7312

  • Specifying a table include list with spaces between elements cause LogMiner queries to miss matches DBZ-7315

  • Debezium heartbeat.action.query does not start before writing to WAL: part 2 DBZ-7316

  • Update Groovy version to 4.x DBZ-7340

  • errors.max.retries is not used to stop retrying DBZ-7342

  • Upgrade Antora to 3.1.7 DBZ-7344

  • Oracle connector is occasionally unable to find SCN DBZ-7345

  • Initial snapshot notifications should use full identifier. DBZ-7347

  • Upgrade Outbox Extension to Quarkus 3.6.5 DBZ-7352

  • MySqlJdbcSinkDataTypeConverterIT#testBooleanDataTypeMapping fails DBZ-7355

Outlook & What’s next?

The Debezium 2.6 release cycle is one of our most ambitious initiatives with lots of new features and changes. You can find more about what the team is working on specifically for 2.6 and the road to Debezium 3.0 in our road map. If you have any suggestions or ideas, please feel free to get in touch with us on our mailing list or in our Zulip chat.

As the team continues springing into action with Debezium 2.6, we also intend to continue to bug fix and address any regressions that are reported to last quarter’s Debezium 2.5 release. Debezium 2.5 is now the project’s stable release, and we encourage everyone to upgrade and get the latest and greatest features. In fact, you can expect the next maintenance release of Debezium, 2.5.1.Final to be released later this week :).

Until next time, happy streaming!

Chris Cranford

Chris is a software engineer at Red Hat. He previously was a member of the Hibernate ORM team and now works on Debezium. He lives in North Carolina just a few hours from Red Hat towers.

   


About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.