We are excited to announce a candidate release for Debezium 3.1, 3.1.0.CR1.

This new release includes several improvements with the JDBC sink and MySQL connectors, support for ISO string temporal values and Keyspace heartbeats with Vitess, key-based routing for RabbitMQ, and more. Let’s dive in and take a look at these new features and improvements.

Breaking changes

With any new release of software, there is often several breaking changes. The Debezium 3.1.0.CR1 release is no exception, so let’s discuss the major changes you should know about.

Query timeout now applies to Oracle LogMiner queries

When the Oracle connector executes its initial query to fetch data from LogMiner, database.query.timeout.ms connector configuration property will control the duration of the query before the query is cancelled (DBZ-8830). When upgrading, check the connector metric MaxDurationOfFetchQueryInMilliseconds to determine whether this new property may need adjustments. By default, the timeout is 10 minutes, but can be disabled when set to 0.

New features and improvements

The upgrade to Debezium 3.1.0.CR1 introduces several new features and improvements in several components:

Core: Centralize logging of sensitive data

We understand that databases house all sorts of information, and that some columns may contain sensitive information. We take pride in making sure that information remains safe and secure. For this reason, we generally prefer to avoid logging sensitive information at INFO, WARN, or ERROR levels.

However, there were some potential corner cases where sensitive column values may be logged at DEBUG or TRACE levels. We added the io.debezium.util.Loggings class several versions ago to centralize this, but not all instances were using this Loggings class (DBZ-8525).

By default, users will notice that the Loggings class records the sensitive information in the logs rather than it included in the original logger in the proceeding log entry. If you prefer to omit the sensitive information, logging configuration can be used to uniquely set a logging level specific to io.debezium.util.Loggings.

For example, if you need to provide your logs to someone but want the sensitive information omitted, the following configuration can achieve that goal.

log4j.logger.io.debezium=TRACE,stdout
log4j.logger.io.debezium.util.Loggings=ERROR,stdout

This configuration will omit all sensitive information while logging all non-sensitive information at TRACE level.

JDBC: Improved performance

We received several community reports that during peak volume, some databases were experiencing unusually high CPU utilization. After investigation, we identified that several SQL queries were performed too frequently, causing the high CPU and reducing connector write throughput (DBZ-8570). Users should now find that the JDBC sink’s write throughput is higher and the CPU utilization should be more reasonable than before.

JDBC: Automatic retries on connection errors

For a Kafka Connect producer, if a connector throws a RetriableException and Kafka Connect is configured to support retries on errors, the runtime will automatically stop and restart the connector. This provides a useful way to handle the tearing down of resources and recreating those resources, such as database connections.

But for a Kafka Connect consumer (sink), the lifecycle of the connector works differently. When the connector throws an error, the lifecycle doesn’t stop and restart the connector, but instead calls the put method again. This can be problematic in the case of certain connection errors because specific resources are not automatically recreated.

Starting with Debezium 3.1, a new JDBC sink connector property connection.restart.on.errors will allow the JDBC sink to retry connection failures (DBZ-8727).

JDBC: Handle BYTES as VARBINARY for SQL Server targets

A new JDBC sink mapping has been added for converting a Kafka BYTES field to VARBINARY column data types (DBZ-8790). This allows source connectors that serialize unknown or other binary data as a Kafka BYTES field to me correctly mapped to a SQL Server target with the VARBINARY column data type.

MySQL: Improved error handling for duplicate server id/uuid

For most connectors, Debezium adopts the philosophy to retry all SQLException or IOException related failures. This strategy has been quite useful, allowing users to utilize the runtime retry mechanism as needed.

However for MySQL, this presents a unique corner case when there are conflicts with the configured server id/uuid. MySQL uses the server id/uuid to uniquely identify an instance on the cluster topology. If more than one server uses the same id/uuid, the instance will throw a SQLException and enter a retry/backoff loop on startup.

With Debezium 3.1, the error handling prefers a fail-fast approach for this specific unique case (DBZ-8786). If you are a MySQL user and notice your connectors are entering a FAILED status more frequently, we recommend checking if this use case applies to you. If it does, you should guarantee that your configuration always uses a unique server id/uuid value.

Vitess: Keyspace heartbeat support

Starting in Vitess v21, a new binlog watermarking strategy was introduced for VStream. This new feature sends a "heartbeat" -like event that represents the shard’s binlog events up to the provided timestamp have been received by the VStream client.

A new configuration option vitess.stream.keyspace.heartbeats can be set to true to include the heartbeat events written to the keyspace heartbeat tables (DBZ-8775). The table.include.list should also include the heartbeat table, using the format <keyspace>.heartbeat.

Vitess: Support ISO string mode temporal precision mode

We introduced the new temporal precision mode IOSTRING in DBZ-6387, which allow for specifying the serialization of temporal values as a string using the ISO8601 format. We’re happy to report the Vitess connector includes support for this new mode (DBZ-8826).

Debezium Server: Key routing support for RabbitMQ

In Debezium 3.1, we have changed how you can route events using configuration. This new approach uses a strategy-based design, that retains old behaviors and introduces the new key-based routing mechanism (DBZ-8752).

First and foremost, the rabbitmq.routingKeyFromTopicName is deprecated and will be removed in a future release. This functionality has been folded into the new rabbitmq.routingKey.source configuration property, and it can be set one to one of the following values:

static

When using the static routing source, the RabbitMQ sink will use the rabbitmq.routingKey static value you have specified in the sink’s configuration. As this value is set in the configuration and read only during the sink startup, the value is static and does not change over the runtime of the sink.

topic

When using the topic routing source, the RabbitMQ sink will source the routing key based on the destination topic name. This mode replaces the old rabbitmq.routingKeyFromTopicName configuration property behavior, which is now deprecated.

key

When using the new key routing source, the RabbitMQ sink will source the routing key based on the event’s record key. This provides the flexibility to control the routing mechanism for RabbitMQ to use the raw Debezium change event’s key or by using a custom transformation to change the event’s key in-flight before sending the event to RabbitMQ.

Examples: Debezium optimized for GraalVM

Change Data Capture (CDC) is widely used in various contexts, such as microservices communication, legacy system modernization, and cache invalidation. The core idea of this pattern is to detect and track changes in a data source (e.g., a database) and propagate them to other systems in real-time or near real-time. Debezium is a CDC platform that provides a wide range of connectors for most data sources. Beyond capturing changes, it also offers transformation capabilities through an intuitive UI for defining debezium instances.

Check out our recent blog Superfast Debezium which walks you through the latest example of using Debezium with GraalVM!

Other changes

The following are some noteworthy changes in 3.1.0.CR1:

  • The first cdc message always lost when using debezium engine to capture oracle data DBZ-8141

  • Update format-maven-plugin to 2.26.0 DBZ-8695

  • Centralize helm chart repo DBZ-8707

  • OTEL libs are not loaded to Docker image DBZ-8767

  • Change the documentation of minimum Java version requirement from 11 to 21 DBZ-8771

  • Add delete.tombstone.handling.mode to ConfigDef returned by config method and change its display name DBZ-8776

  • Signal Channel Kafka restart snapshot multiple snapshot after connector restart DBZ-8780

  • Update Debezium platform images in values.yaml DBZ-8781

  • Allow Debezium server to use Kafka Connect format for the records DBZ-8782

  • Sources and home in debezium platform helm chart points to old repo DBZ-8784

  • Write README for debezium-chart repo DBZ-8785

  • Remove Helm from Debezium operator manifest README DBZ-8791

  • Write blog post about the recent changes on charts.debezium.io DBZ-8792

  • DebeziumServerPostgresIT randomly fails DBZ-8821

  • Test keyspace heartbeats during snapshot DBZ-8824

  • Make methods for adding fields into the record reuseable DBZ-8825

  • Enable build of debezium platform images DBZ-8829

  • Unexpected null value for Field Configuration deprecated aliases DBZ-8832

In total, 30 issues were resolved in Debezium 3.1.0.CR1. The list of changes can also be found in our release notes.

Chris Cranford

Chris is a software engineer at Red Hat. He previously was a member of the Hibernate ORM team and now works on Debezium. He lives in North Carolina just a few hours from Red Hat towers.

   


About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.