Another release cadence done, and we’re pleased to announce the next preview release of Debezium is available, 3.2.0.Alpha1. This release is built on Kafka 4.0 with several breaking changes with many improvements and bugfixes.

Let’s take a moment and dive into all these changes.

Breaking changes

With any new major release of software, there is often several breaking changes. The Debezium 3.2.0.Alpha1 release is no exception, so let’s discuss the major changes you should be aware of.

Debezium Core

Debezium for IBMi

Debezium for Oracle

Debezium Core

Built on Kafka 4.0

Debezium 3.2 is built and tested using Kafka 4.0 (DBZ-8875).

Debezium for IBMi

Error raised on missing journal files

In previous versions of Debezium for IBMi, if a journal file was missing, the connector resumed from the first available journal. This would lead to silent data loss. This behavior has changed and the connector will now throw an error if a required journal file no longer exists (DBZ-8898).

Debezium for Oracle

Using secure mTLS connections with JKS

When using the Debezium for Oracle connector to establish a secure mTLS connection using Java Keystores (JKS), special configuration is necessary. We have added this information to the Oracle connector documentation (DBZ-8788).

New features and improvements

The following describes all noteworthy new features and improvements in Debezium 3.2.0.Alpha1. For a complete list, be sure to read the release notes for more details.

Debezium Core

Debezium JDBC sink

Debezium AI

Debezium for IBMi

Debezium Embedded

Debezium Platform

Debezium for Oracle

Debezium Server

Debezium Core

Regression with logging performance

In Debezium 3.1, a change was introduced to centralize the logging of sensitive information. Unfortunately, this change introduced a regression, leading to lower performance across a variety of code paths.

This change has been reverted and replaced with an implementation that retains the centralized logging intent while restoring the prior performance (DBZ-8879).

Reset certain streaming metrics through JMX

During periods of idle activity, a Debezium connector will continue to report LagBehindSource JMX metric as the last computed value, as this value is only updated as new changes are received. For some environments, this is less than desirable or not intuitive if you are unaware of the idle or low activity window.

Debezium 3.1.1.Final introduces a new option that can be triggered through JConsole or other JMX integrations to reset the current LagBehindSource metric by calling the new function resetLagBehindSource (DBZ-8885).

Log JMX MBean name when registration fails

When the JMX metric MBean fails registration, it’s most often due to the fact that an existing MBean is already registered with such name. This can happen for a variety of reasons, including a prior task not stopping gracefully or misconfiguration between two connectors sharing the same topic.prefix.

Unfortunately the logged information does not provide adequate information to know which MBean and connector the failure is about. To address this, the JMX MBean name will be logged when the registration fails so that the exact connector deployment that’s affected can be easily determined (DBZ-8862).

Debezium for IBMi

Add support for BOOLEAN data types

IBM provides support for the BOOLEAN data types on V7R5+ and later V7R4 releases of the IBM iSeries database. In prior versions, if a table contained a boolean data type, this would result in a hard failure with the Debezium for IBMi connector.

With Debezium 3.2, BOOLEAN data types are now captured without throwing an exception (DBZ-7796).

Add decimal handling mode support

The decimal.handling.mode configuration property specifies how the connector should handle floating point values for connector-specific data types. In prior versions of the Debezium for IBMi connector, this configuration property was not honored.

Debezium 3.2 introduces support for decimal.handling.mode, allow you to specify whether floating point values are serialized (DBZ-8301). This configuration property can be configured with one of three values:

precise

Represents values precisely by using java.math.BigDecimal values represented in change events in a binary form.

double

Represents values by using double values. Using double values is easier, but can result in a loss of precision.

string

Encodes values as formatted strings. Using the string option is easier to consume, but results in a loss of semantic information about the real type.

Debezium for Oracle

Unbuffered LogMiner adapter using committed data only

Debezium for Oracle introduces a new adapter implementation that buffers transactions on the relational database using the database’s memory buffers rather than relying on the JVM heap or off-heap cache implementations (DBZ-8924).

To get started with the new adapter, the connector configuration needs to be adjusted, as shown here:

{
  "database.connection.adapter": "logminer_unbuffered"
}

While most of the log.mining.* connector configurations are honored by this new adapter, any configuration that relates to buffer, cache, or transaction management are not used.

This new adapter implementation is based on Oracle LogMiner’s COMMITTED_DATA_ONLY mode. This mode forces Oracle LogMiner to use the database’s available memory to buffer transactions and to only supply Debezium with committed changes. Because Debezium only receives committed changes, the connector can immediately dispatch changes as they’re read which avoids complex memory sizing requirements along with the need to handle partial rollbacks due to savepoints or constraint violations.

Because Oracle LogMiner is now responsible for buffering transactions until the commit is observed, Oracle LogMiner will rely on the database’s memory to handle this staging requirement. This directly means that this staging is only as powerful as the PGA_AGGREGATE_LIMIT database administrator’s configured Oracle database instance parameter. When a transaction’s memory requirement exceeds this limit, Oracle will refuse to buffer the transaction and will throw a connector error. This can be resolved by raising the configured limit, removing the limit entirely, or resizing your transactions so they fit within the boundary of this Oracle limit.

This feature is currently in incubating state, i.e. exact semantics, configuration options etc. may change in future revisions, based on the feedback we receive. Please let us know if you encounter any problems while using this extension.

Filter Oracle LogMiner results by client id

The Oracle LogMiner adapters provide a myriad of ways to exclude transactions by explicitly passing filters to the database query to excluding or only including transactions performed by specific database usernames. In this release, we’ve added another interesting filter criteria based on the Oracle LogMiner field CLIENT_ID, where you can elect to include or exclude changes based on this field’s value (DBZ-8904).

The following configuration properties can be used:

log.mining.clientid.include.list

Specifies a comma-separated list of values to match against the CLIENT_ID field for capture.

log.mining.clientid.exclude.list

Specifies a comma-separated list of values to match against the CLIENT_ID field to exclude from capture.

Just like any of the other include/exclude configuration properties, these are mutually exclusive.

Reduced CPU utilization under specific scenarios

In Debezium 3.1, we introduced a change as part of DBZ-8665 to restore the same performance from 2.7.0.Final when processing constraint violations or save point rollback operations. While this change was successful at reducing the latency caused by processing such events in Debezium 3.0 through 3.0.7, we found that even the performance from Debezium 2.7 was overall suboptimal.

We have implemented a complete rework of the transaction buffering solution to handle constraint violations and save point rollbacks more efficiently (DBZ-8860).

When using heap-based buffering, we reduced the time needed to process such events by nearly 90% while also reducing the time complexity for off-heap buffering by 97-99%. In addition to the time complexity reduction, we have also reduced the overall CPU usage while handling these events to remain aligned with expectations.

Improved the online_catalog mining strategy performance

Prior to adding the hybrid mining strategy, the Debezium Oracle LogMiner implementation included a specific condition to include events for tables where LogMiner failed to resolve the table name. This use case happens when the object id and version in the redo entry does not match the online data dictionary, which occurs after specific DDL operations are performed.

Including these changes, particularly when users perform bulk operations and LogMiner fails to resolve the table name, this increases latency, connector overhead, only so that the connector can log the unknown table. Given the reduction of performance solely for logging, we have chosen to omit including these events in the data fetch moving forward (DBZ-8926).

Improved the hybrid mining strategy performance

We also identified another performance bottleneck, this time when using the hybrid mining strategy while processing bulk events where LogMiner failed to resolve the table name during object id/version mismatches. The hybrid strategy is designed to handle this use case and fallback to Debezium’s relational model to resolve the table name; however, despite using a cache, the cost overhead for the cache lookups for bulk operations was significantly high.

In order to reduce the cost and improve throughput performance of bulk operations on unknown tables, we have reworked the lookup in a way that increases the throughput by significantly, allowing bulk operations to be handled more efficiently and with less overall CPU utilization (DBZ-8925).

Improve log message when failing to apply a partial rollback

When using the Debezium for Oracle LogMiner buffered adapter, when a partial rollback is observed, the log entry does not capture critical information that could be useful for debugging purposes. Debezium 3.2 now includes the transaction identifier and the system change number associated with the partial rollback redo entry (DBZ-8944).

Debezium JDBC sink

Configuration available to column/table naming strategies

The Debezium JDBC sink offers a variety of configurable hooks, including the option to define deployment-specific TableNamingStrategy or ColumnNamingStrategy implementations. However, these implementations were unable to obtain the full configuration for the connector deployment, making these hooks far less restrictive than intended.

With Debezium 3.2, these strategy implementations provide a new configure method (DBZ-7051), shown here:

@Override
public void configure(Map<String, Object> config) {
}

To provide backward compatibility, this method is defined as default with no implementation, so no code changes are required where this configuration step is unnecessary.

Debezium Embedded

New polling started/ended callbacks added

The DebeziumEngine interface provides a variety of features, including knowing when the connector or task has started or stopped. However, because the connector operates asynchronously, it may be useful to know when the connector has entered the polling phase for specific user-driven logic.

To address this concern, two new methods have been added to the ConnectorCallback contract (DBZ-8948):

/**
 * Called after all the tasks have been successfully started and engine has moved to polling phase, but before actual polling is started.
 */
default void pollingStarted() {
    // nothing by default
}

/**
 * Called after all the tasks have successfully exited from the polling loop, i.e. the callback is not called when any of the tasks has thrown
 * exception during polling or was interrupted abruptly.
 */
default void pollingStopped() {
    // nothing by default
}

Now when setting up a DebeziumEngine instance, the ConnectorCallback supplied implementation can include logic to call during these lifecycle state changes as needed.

Debezium Server

Milvus allows unwinding of JSON data types

Debezium source connectors are designed to emit JSON data type values as a io.debezium.data.Json semantic type that encodes the JSON value as a string. However, this may not always be the desired outcome when sinking changes to a Milvus sink using Debezium Server.

A new configuration property, debezium.sink.milvus.unwind.json, has been added that can be set to either true or false (the default). When this property is set to true, the JSON string value will be represented as a JsonObject instead (DBZ-8909).

Redis can skip heartbeat messages

Debezium source connectors are often configured with heartbeat events so that at no point during low activity periods that the offset information for the source becomes stale. However, for some sinks like Redis, these heartbeat events aren’t useful to be passed to the sink target.

A new configuration property, debezium.sink.redis.skip.heartbeat.messages, has been added that can be set to either true or false (the default). When this property is set to true, the Redis sink will skip emitting heartbeat events to the Redis target; however, the heartbeat events will continue to influence the management of stale offsets (DBZ-8911).

Debezium AI

Introduce timeout for Ollama embedding models

We have added a new configuration property ollama.operation.timeout.ms for the Debezium AI Ollama model integration using the FieldToEmbedding transformation. This configuration property specifies the number of milliseconds that the model operation is allowed to execute for before the request is timed out. By default, the transformation waits 15 seconds, but can be adjusted accordingly (DBZ-8908).

Debezium Platform

Improve navigation and workflow for transformations

We have added several new features to the Debezium Management Platform interface for transformations (DBZ-8328), which includes:

  • A new main navigation menu option called Transforms.

  • Improved the pipeline creation workflow allowing transformations to be added in-flight.

  • Support for modifying and removing existing transformations.

We’d love your feedback on the new navigation workflow and improvements around transformations.

Other changes

  • Incorrect NumberOfEventsFiltered metrics in streaming DBZ-8576

  • Signal table column names are arbitrary, but delete strategy expects column named id DBZ-8723

  • Prevent write operations in PostgreSQL in read-only mode. DBZ-8743

  • Upgrade MariaDB driver to 3.5.3 DBZ-8758

  • Add Localization support to UI DBZ-8859

  • Upgrade RocketMQ version from 5.1.4 to 5.2.0 DBZ-8864

  • Bump Chicory version and take advantage of latest improvements DBZ-8867

  • When using the Oracle relaxed SQL parser setup, strings with apostrophe followed by comma are trimmed DBZ-8869

  • Oracle Ehcache buffer will silently evict entries when configured size limits are reached DBZ-8874

  • Improve MySQL/MariaDB connector resilience during post-schema recovery reconnect DBZ-8877

  • Transaction events are not removed when transaction event count over threshold DBZ-8880

  • Setting Oracle buffer type to an unsupported/invalid value is not validated properly DBZ-8886

  • Oracle timestamp columns are ignored when temporal mode set to ISOSTRING DBZ-8889

  • Kinesis Connector does not send failed records during retry, it sends records in original batch DBZ-8893

  • DDL parsing fails on "BY USER FOR STATISTICS" virtual column clause DBZ-8895

  • Postgres CapturedTables metric isn’t populated. DBZ-8897

  • Raise more meaningful exception in case of inconsistent post processor config DBZ-8901

  • Update Outbox Extension Quarkus version to 3.21.2 DBZ-8905

  • Update to latest LTS of Quarkus 3.15.4 DBZ-8906

  • FieldToEmbedding SMT fails with NPE for delete records DBZ-8907

  • FieldToEmbedding SMT crashes when source field name is substring of embedding name DBZ-8910

  • Improve performance by removing unnecessary filter check DBZ-8921

  • Async engine doesn’t terminate gracefully upon StopEngineException DBZ-8936

  • Remove unnecessary metadata query and map fetch calls DBZ-8938

  • Processing error because of incomplete date part of DATETIME datatype in MariaDB DBZ-8940

  • [Conductor] Add endpoint to verify correct setup of signal data collection DBZ-8941

  • ORA-08186 invalid timestamp specified occurs when connector is started DBZ-8943

  • Multiple Predicates Don’t Function with the Operator API DBZ-8975

In total, 78 issues were resolved in Debezium 3.2.0.Alpha1. The list of changes can also be found in our release notes.

Chris Cranford

Chris is a software engineer at Red Hat. He previously was a member of the Hibernate ORM team and now works on Debezium. He lives in North Carolina just a few hours from Red Hat towers.

   


About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.