It’s been a busy month in Debezium-land, and it’s my pleasure to announce the first release of Debezium 2.4 series, 2.4.0.Alpha1.

This release includes a plethora of changes, 59 changes to be exact, that cover a wide range of resolved issues, improvement to stability, new features, and several breaking changes. Let’s dive into each of these and discuss them in more depth.

Breaking changes

MongoDB

The MongoDB connector explicitly preferred to use the secondary under specific scenarios previously. This explicit usage created problems for users who wanted to connect to the primary node. Thanks to recent changes (DBZ-6521), this is no longer the case and the connection string setting is used instead.

Vitess

The Vitess connector’s change event structure has been slightly adjusted thanks to changes (DBZ-6617). The change event’s source information block now includes a new field that identifies the shard the event originated from.

New Features

Offset editor example

Users often express the need to manipulate connector offsets for various reasons. This can often be very difficult for those who may not be familiar with Kafka’s CLI tools or Java if you use Debezium Server. Thanks to a contribution (DBZ-6338) by Nathan Smit, you can now use an editor to manipulate the offsets from the command line or a web-based interface.

Head to our examples repository and follow the README.md to get started.

Error handling

Some Debezium connectors previously used a connector property, errors.max.retries. This property controlled how often a Debezium connector failure exception would be explicitly wrapped in a RetriableException but the connector threw the raw exception up to the runtime. While this may sound similar to Kafka Connect’s errors.retry.timeout, this effectively gave users a common way to deal with retries across multiple Debezium runtimes, including Kafka Connect, Debezium Server, and Debezium Embedded.

With this release, DBZ-6573 unifies this behavior making it available to all connectors.

Notify initial snapshot progress

Debezium’s new notification subsystem provides an easy way to integrate third-party tools and applications with Debezium to gain insight into the ongoing change data capture process, above and beyond the traditional JMX approach. In 2.4, the notification subsystem now includes the ability to notify you about the status of the ongoing initial snapshot (DBZ-6416).

Initial snapshot notifications are emitted with an aggregatetType of Initial Snapshot and contain a type field that exposes the current status of the snapshot. The possible values include: STARTED, ABORTED, PAUSED, RESUMED, IN_PROGRESS, TABLE_SCAN_COMPLETED, and COMPLETED.

MySQL improvements

Thanks to a contribution provided by Harvey Yue (DBZ-6472), the MySQL connector will use parallelization to generate schema events during its snapshot phase. This should improve the overall performance when capturing the schema for many tables in your database. We plan to investigate how this can be extended to other relational connectors.

MongoDB improvements

The MongoDB connector continues to see lots of active development. This release introduces several new features specifically for MongoDB, which include:

  • Cluster-wide privileges are no longer necessary when watching a single database or collection (DBZ-6182).

  • Read preference taken from connection string (DBZ-6468, DBZ-6578).

  • Support authentication with TC MongoDB deployments (DBZ-6596).

As we continue to make further improvements to the MongoDB connector, please let us know if there are still rough edges or enhancements that will help streamline its usage.

Oracle improvements

Debezium 2.4 supports several new Oracle data types, which include XML_TYPE and RAW (DBZ-3605). Two new Oracle dependencies were necessary to support XML: xdb and xmlparserv2. These dependencies are not redistributable, so they’re not included in the connector plugin archive by default, much like the connector’s driver. You must obtain these directly from Maven Central or oracle, just like the driver dependency.

In addition, XML works similarly to CLOB and BLOB data types; therefore, the connector must be configured with lob.enabled set to true to ingest XML changes. We’d love to hear your feedback on this new feature as it’s been requested for quite some time.

JDBC sink improvements

Thanks to a contribution from Nicholas Fwang (DBZ-6595), the JDBC sink connector can now reference values from the change event’s source information block as a part of the connector’s configuration property table.name.format. If you want to reference such fields, simply use ${source.<field-name>} in the configuration, and the field’s value will be used.

In addition, Roman Kudryashov also contributed the ability to resolve a row’s primary key from a header defined on the change event (DBZ-6602). To use this new feature, specify the connector configuration property primary.key.mode as record_header. If the header value is a primitive type, you will need to define a primary.key.fields configuration similar to how you would if the event’s record key was a primitive. If the header value is a struct type, all fields of the structure will be used by default, but specifying the primary.key.fields property allows you to choose a subset of fields from the header as the key.

Spanner improvements

It was possible due to certain conditions that a Spanner connector may not advance from the START_INITIAL_SYNC state during initialization. After investigation by Nancy Xu, a new configuration option was introduced to supply a configurable timeout. This can be done by adding the following to the connector’s configuration:

connector.spanner.task.await.initialization.timeout=<timeout in milliseconds>

Debezium UI metrics

The Debezium UI project allows you to easily deploy any Debezium connector onto Kafka Connect using a web-based interface. This release has improved the interface by including several connector metrics (DBZ-5321) on the main connector listing view. We’d love your feedback on this change and welcome any suggestions on other metrics you may find useful.

Other fixes

In addition, there were quite a number of stability and bug fixes that made it into this release. These include the following:

  • Mysql connector fails to parse statement FLUSH FIREWALL_RULES DBZ-3925

  • Add the API endpoint to expose running connector metrics DBZ-5359

  • Display critical connector metrics DBZ-5360

  • Snapshot result not saved if LAST record is filtered out DBZ-5464

  • Define and document schema history topic messages schema DBZ-5518

  • Align query.fetch.size across connectors DBZ-5676

  • Upgrade to Apache Kafka 3.5.0 DBZ-6047

  • Remove downstream related code from UI Frontend code DBZ-6394

  • Make Signal actions extensible DBZ-6417

  • CloudEventsConverter throws static error on Kafka Connect 3.5+ DBZ-6517

  • Dependency io.debezium:debezium-testing-testcontainers affects logback in tests DBZ-6525

  • Cleanup duplicate jobs from jenkins DBZ-6535

  • Implement sharded MongoDB ocp deployment and integration tests DBZ-6538

  • Batches with DELETE statement first will skip everything else DBZ-6576

  • Oracle unsupported DDL statement - drop multiple partitions DBZ-6585

  • Only Struct objects supported for [Header field insertion], found: null DBZ-6588

  • Support PostgreSQL coercion for UUID, JSON, and JSONB data types DBZ-6589

  • MySQL parser cannot parse CAST AS dec DBZ-6590

  • Refactor retry handling in Redis schema history DBZ-6594

  • Excessive Log Message 'Marking Processed Record for Topic' DBZ-6597

  • Support for custom tags in the connector metrics DBZ-6603

  • Fixed DataCollections for table scan completion notification DBZ-6605

  • Oracle connector is not recoverable if ORA-01327 is wrapped by another JDBC or Oracle exception DBZ-6610

  • Fatal error when parsing Mysql (Percona 5.7.39-42) procedure DBZ-6613

  • Build of Potgres connector fails when building against Kafka 2.X DBZ-6614

  • Upgrade postgresql driver to v42.6.0 DBZ-6619

  • MySQL ALTER USER with RETAIN CURRENT PASSWORD fails with parsing exception DBZ-6622

  • Upgrade Quarkus to 3.2.0.Final DBZ-6626

  • Inaccurate documentation regarding additional-condition DBZ-6628

  • Oracle connection SQLRecoverableExceptions are not retried by default DBZ-6633

  • Upgrade kcctl to 1.0.0.Beta3 DBZ-6642

  • Cannot delete non-null interval value DBZ-6648

  • Upgrade gRPC to 1.56.1 DBZ-6649

  • ConcurrentModificationException thrown in Debezium 2.3 DBZ-6650

  • Dbz crashes on parsing Mysql Procedure Code (Statement Labels) DBZ-6651

  • CloudEvents converter is broken for JSON message deserialization DBZ-6654

  • Vitess: Connector fails if table name is a mysql reserved word DBZ-6656

  • Junit conflicts cause by test-containers module using transitive Junit5 from quarkus DBZ-6659

  • Disable Kafka 2.x CRON trigger DBZ-6667

What’s next?

This initial release of Debezium 2.4 is already packed with lots of new features and the team is only getting started. Looking at our road map, we’ve already tackled nearly half of our plans for 2.4, but much still remains including:

  • Single message transforms for TimescaleDB and Timestamps

  • OpenLogReplicator ingestion for Oracle

  • Ad-hoc blocking snapshots

  • Parallelization of Debezium Embedded

  • Parallel incremental snapshots for MongoDB

  • Further improvements to Debezium UI

We intend to stick to our approximate two week cadence, so expect Alpha2 at the start of August. Until then, please be sure to get in touch with us on the mailing list or our chat if you have any ideas or suggestions.

Chris Cranford

Chris is a software engineer at Red Hat. He previously was a member of the Hibernate ORM team and now works on Debezium. He lives in North Carolina just a few hours from Red Hat towers.

   


About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.