Debezium 3.2.0.Alpha1 Released

Breaking changes

With any new major release of software, there is often several breaking changes. The Debezium 3.2.0.Alpha1 release is no exception, so let’s discuss the major changes you should be aware of.

Debezium Core

Debezium for IBMi

Debezium for Oracle

Debezium Core

Built on Kafka 4.0

Debezium 3.2 is built and tested using Kafka 4.0 (DBZ-8875).

Debezium for IBMi

Error raised on missing journal files

In previous versions of Debezium for IBMi, if a journal file was missing, the connector resumed from the first available journal. This would lead to silent data loss. This behavior has changed and the connector will now throw an error if a required journal file no longer exists (DBZ-8898).

Debezium for Oracle

Using secure mTLS connections with JKS

When using the Debezium for Oracle connector to establish a secure mTLS connection using Java Keystores (JKS), special configuration is necessary. We have added this information to the Oracle connector documentation (DBZ-8788).

New features and improvements

The following describes all noteworthy new features and improvements in Debezium 3.2.0.Alpha1. For a complete list, be sure to read the release notes for more details.

Debezium Core

Regression with logging performance

In Debezium 3.1, a change was introduced to centralize the logging of sensitive information. Unfortunately, this change introduced a regression, leading to lower performance across a variety of code paths.

This change has been reverted and replaced with an implementation that retains the centralized logging intent while restoring the prior performance (DBZ-8879).

Reset certain streaming metrics through JMX

During periods of idle activity, a Debezium connector will continue to report LagBehindSource JMX metric as the last computed value, as this value is only updated as new changes are received. For some environments, this is less than desirable or not intuitive if you are unaware of the idle or low activity window.

Debezium 3.1.1.Final introduces a new option that can be triggered through JConsole or other JMX integrations to reset the current LagBehindSource metric by calling the new function resetLagBehindSource (DBZ-8885).

Log JMX MBean name when registration fails

When the JMX metric MBean fails registration, it’s most often due to the fact that an existing MBean is already registered with such name. This can happen for a variety of reasons, including a prior task not stopping gracefully or misconfiguration between two connectors sharing the same topic.prefix.

Unfortunately the logged information does not provide adequate information to know which MBean and connector the failure is about. To address this, the JMX MBean name will be logged when the registration fails so that the exact connector deployment that’s affected can be easily determined (DBZ-8862).

Debezium for IBMi

Add support for BOOLEAN data types

IBM provides support for the BOOLEAN data types on V7R5+ and later V7R4 releases of the IBM iSeries database. In prior versions, if a table contained a boolean data type, this would result in a hard failure with the Debezium for IBMi connector.

With Debezium 3.2, BOOLEAN data types are now captured without throwing an exception (DBZ-7796).

Add decimal handling mode support

The decimal.handling.mode configuration property specifies how the connector should handle floating point values for connector-specific data types. In prior versions of the Debezium for IBMi connector, this configuration property was not honored.

Debezium 3.2 introduces support for decimal.handling.mode, allow you to specify whether floating point values are serialized (DBZ-8301). This configuration property can be configured with one of three values:

precise: Represents values precisely by using java.math.BigDecimal values represented in change events in a binary form.
double: Represents values by using double values. Using double values is easier, but can result in a loss of precision.
string: Encodes values as formatted strings. Using the string option is easier to consume, but results in a loss of semantic information about the real type.

Debezium for Oracle

Unbuffered LogMiner adapter using committed data only

Debezium for Oracle introduces a new adapter implementation that buffers transactions on the relational database using the database’s memory buffers rather than relying on the JVM heap or off-heap cache implementations (DBZ-8924).

To get started with the new adapter, the connector configuration needs to be adjusted, as shown here:

{
  "database.connection.adapter": "logminer_unbuffered"
}

While most of the log.mining.* connector configurations are honored by this new adapter, any configuration that relates to buffer, cache, or transaction management are not used.

This new adapter implementation is based on Oracle LogMiner’s COMMITTED_DATA_ONLY mode. This mode forces Oracle LogMiner to use the database’s available memory to buffer transactions and to only supply Debezium with committed changes. Because Debezium only receives committed changes, the connector can immediately dispatch changes as they’re read which avoids complex memory sizing requirements along with the need to handle partial rollbacks due to savepoints or constraint violations.

Because Oracle LogMiner is now responsible for buffering transactions until the commit is observed, Oracle LogMiner will rely on the database’s memory to handle this staging requirement. This directly means that this staging is only as powerful as the PGA_AGGREGATE_LIMIT database administrator’s configured Oracle database instance parameter. When a transaction’s memory requirement exceeds this limit, Oracle will refuse to buffer the transaction and will throw a connector error. This can be resolved by raising the configured limit, removing the limit entirely, or resizing your transactions so they fit within the boundary of this Oracle limit.

This feature is currently in incubating state, i.e. exact semantics, configuration options etc. may change in future revisions, based on the feedback we receive. Please let us know if you encounter any problems while using this extension.

Filter Oracle LogMiner results by client id

The Oracle LogMiner adapters provide a myriad of ways to exclude transactions by explicitly passing filters to the database query to excluding or only including transactions performed by specific database usernames. In this release, we’ve added another interesting filter criteria based on the Oracle LogMiner field CLIENT_ID, where you can elect to include or exclude changes based on this field’s value (DBZ-8904).

The following configuration properties can be used:

log.mining.clientid.include.list: Specifies a comma-separated list of values to match against the CLIENT_ID field for capture.
log.mining.clientid.exclude.list: Specifies a comma-separated list of values to match against the CLIENT_ID field to exclude from capture.

Just like any of the other include/exclude configuration properties, these are mutually exclusive.

Reduced CPU utilization under specific scenarios

In Debezium 3.1, we introduced a change as part of DBZ-8665 to restore the same performance from 2.7.0.Final when processing constraint violations or save point rollback operations. While this change was successful at reducing the latency caused by processing such events in Debezium 3.0 through 3.0.7, we found that even the performance from Debezium 2.7 was overall suboptimal.

We have implemented a complete rework of the transaction buffering solution to handle constraint violations and save point rollbacks more efficiently (DBZ-8860).

When using heap-based buffering, we reduced the time needed to process such events by nearly 90% while also reducing the time complexity for off-heap buffering by 97-99%. In addition to the time complexity reduction, we have also reduced the overall CPU usage while handling these events to remain aligned with expectations.

Improved the `online_catalog` mining strategy performance

Prior to adding the hybrid mining strategy, the Debezium Oracle LogMiner implementation included a specific condition to include events for tables where LogMiner failed to resolve the table name. This use case happens when the object id and version in the redo entry does not match the online data dictionary, which occurs after specific DDL operations are performed.

Including these changes, particularly when users perform bulk operations and LogMiner fails to resolve the table name, this increases latency, connector overhead, only so that the connector can log the unknown table. Given the reduction of performance solely for logging, we have chosen to omit including these events in the data fetch moving forward (DBZ-8926).

Improved the `hybrid` mining strategy performance

We also identified another performance bottleneck, this time when using the hybrid mining strategy while processing bulk events where LogMiner failed to resolve the table name during object id/version mismatches. The hybrid strategy is designed to handle this use case and fallback to Debezium’s relational model to resolve the table name; however, despite using a cache, the cost overhead for the cache lookups for bulk operations was significantly high.

In order to reduce the cost and improve throughput performance of bulk operations on unknown tables, we have reworked the lookup in a way that increases the throughput by significantly, allowing bulk operations to be handled more efficiently and with less overall CPU utilization (DBZ-8925).

Improve log message when failing to apply a partial rollback

When using the Debezium for Oracle LogMiner buffered adapter, when a partial rollback is observed, the log entry does not capture critical information that could be useful for debugging purposes. Debezium 3.2 now includes the transaction identifier and the system change number associated with the partial rollback redo entry (DBZ-8944).

Debezium JDBC sink

Configuration available to column/table naming strategies

The Debezium JDBC sink offers a variety of configurable hooks, including the option to define deployment-specific TableNamingStrategy or ColumnNamingStrategy implementations. However, these implementations were unable to obtain the full configuration for the connector deployment, making these hooks far less restrictive than intended.

With Debezium 3.2, these strategy implementations provide a new configure method (DBZ-7051), shown here:

@Override
public void configure(Map<String, Object> config) {
}

To provide backward compatibility, this method is defined as default with no implementation, so no code changes are required where this configuration step is unnecessary.

Debezium Embedded

New polling started/ended callbacks added

The DebeziumEngine interface provides a variety of features, including knowing when the connector or task has started or stopped. However, because the connector operates asynchronously, it may be useful to know when the connector has entered the polling phase for specific user-driven logic.

To address this concern, two new methods have been added to the ConnectorCallback contract (DBZ-8948):

/**
 * Called after all the tasks have been successfully started and engine has moved to polling phase, but before actual polling is started.
 */
default void pollingStarted() {
    // nothing by default
}

/**
 * Called after all the tasks have successfully exited from the polling loop, i.e. the callback is not called when any of the tasks has thrown
 * exception during polling or was interrupted abruptly.
 */
default void pollingStopped() {
    // nothing by default
}

Now when setting up a DebeziumEngine instance, the ConnectorCallback supplied implementation can include logic to call during these lifecycle state changes as needed.

Debezium Server

Milvus allows unwinding of JSON data types

Debezium source connectors are designed to emit JSON data type values as a io.debezium.data.Json semantic type that encodes the JSON value as a string. However, this may not always be the desired outcome when sinking changes to a Milvus sink using Debezium Server.

A new configuration property, debezium.sink.milvus.unwind.json, has been added that can be set to either true or false (the default). When this property is set to true, the JSON string value will be represented as a JsonObject instead (DBZ-8909).

Redis can skip heartbeat messages

Debezium source connectors are often configured with heartbeat events so that at no point during low activity periods that the offset information for the source becomes stale. However, for some sinks like Redis, these heartbeat events aren’t useful to be passed to the sink target.

A new configuration property, debezium.sink.redis.skip.heartbeat.messages, has been added that can be set to either true or false (the default). When this property is set to true, the Redis sink will skip emitting heartbeat events to the Redis target; however, the heartbeat events will continue to influence the management of stale offsets (DBZ-8911).

Debezium AI

Introduce timeout for Ollama embedding models

We have added a new configuration property ollama.operation.timeout.ms for the Debezium AI Ollama model integration using the FieldToEmbedding transformation. This configuration property specifies the number of milliseconds that the model operation is allowed to execute for before the request is timed out. By default, the transformation waits 15 seconds, but can be adjusted accordingly (DBZ-8908).

Debezium Platform

Improve navigation and workflow for transformations

We have added several new features to the Debezium Management Platform interface for transformations (DBZ-8328), which includes:

A new main navigation menu option called Transforms.
Improved the pipeline creation workflow allowing transformations to be added in-flight.
Support for modifying and removing existing transformations.

We’d love your feedback on the new navigation workflow and improvements around transformations.

Other changes

Incorrect NumberOfEventsFiltered metrics in streaming DBZ-8576
Signal table column names are arbitrary, but delete strategy expects column named id DBZ-8723
Prevent write operations in PostgreSQL in read-only mode. DBZ-8743
Upgrade MariaDB driver to 3.5.3 DBZ-8758
Add Localization support to UI DBZ-8859
Upgrade RocketMQ version from 5.1.4 to 5.2.0 DBZ-8864
Bump Chicory version and take advantage of latest improvements DBZ-8867
When using the Oracle relaxed SQL parser setup, strings with apostrophe followed by comma are trimmed DBZ-8869
Oracle Ehcache buffer will silently evict entries when configured size limits are reached DBZ-8874
Improve MySQL/MariaDB connector resilience during post-schema recovery reconnect DBZ-8877
Transaction events are not removed when transaction event count over threshold DBZ-8880
Setting Oracle buffer type to an unsupported/invalid value is not validated properly DBZ-8886
Oracle timestamp columns are ignored when temporal mode set to ISOSTRING DBZ-8889
Kinesis Connector does not send failed records during retry, it sends records in original batch DBZ-8893
DDL parsing fails on "BY USER FOR STATISTICS" virtual column clause DBZ-8895
Postgres CapturedTables metric isn’t populated. DBZ-8897
Raise more meaningful exception in case of inconsistent post processor config DBZ-8901
Update Outbox Extension Quarkus version to 3.21.2 DBZ-8905
Update to latest LTS of Quarkus 3.15.4 DBZ-8906
FieldToEmbedding SMT fails with NPE for delete records DBZ-8907
FieldToEmbedding SMT crashes when source field name is substring of embedding name DBZ-8910
Improve performance by removing unnecessary filter check DBZ-8921
Async engine doesn’t terminate gracefully upon StopEngineException DBZ-8936
Remove unnecessary metadata query and map fetch calls DBZ-8938
Processing error because of incomplete date part of DATETIME datatype in MariaDB DBZ-8940
[Conductor] Add endpoint to verify correct setup of signal data collection DBZ-8941
ORA-08186 invalid timestamp specified occurs when connector is started DBZ-8943
Multiple Predicates Don’t Function with the Operator API DBZ-8975

In total, 78 issues were resolved in Debezium 3.2.0.Alpha1. The list of changes can also be found in our release notes.

A big thank you to all the contributors from the community who worked diligently on this release: Andrea Peruffo, Anil Dasari, Anisha Mohanty, Ashok, Bhagyashree Goyal, Oskar Bonde, Chris Cranford, Gaurav Miglani, Giovanni Panice, Gunnar Morling, Gustavo Lira, Haris Bin Saif, Haris Osmanagić, Jiri Pechanec, Joseph Koshakow, Kavya R, Kavya Ramaiah, Mario Fiore Vitale, Micro Huang, Petar Kostov, Peter Hamer, Philipp Bouzid, Rajendra Dangwal, Robert Roldan, Vojtech Juranek, Yossi Shirizli

Chris Cranford

Chris is a software engineer at IBM and formerly Red Hat where he works on Debezium and deepens his expertise in all things Oracle and Change Data Capture on a daily basis. He previously worked on Hibernate, the leading open-source JPA persistence framework, and continues to contribute to Quarkus. Chris is based in North Carolina, United States.

About Debezium

Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.

Get involved

We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.

Copyright © Debezium and it's authors. All Rights Reserved. For details on our trademarks, please visit our Trademark Policy and Trademark List. Trademarks of third parties are owned by their respective holders and their mention here does not suggest any endorsement or association.

Breaking changes

Debezium Core

Built on Kafka 4.0

Debezium for IBMi

Error raised on missing journal files

Debezium for Oracle

Using secure mTLS connections with JKS

New features and improvements

Debezium Core

Regression with logging performance

Reset certain streaming metrics through JMX

Log JMX MBean name when registration fails

Debezium for IBMi

Add support for BOOLEAN data types

Add decimal handling mode support

Debezium for Oracle

Unbuffered LogMiner adapter using committed data only

Filter Oracle LogMiner results by client id

Reduced CPU utilization under specific scenarios

Improved the online_catalog mining strategy performance

Improved the hybrid mining strategy performance

Improve log message when failing to apply a partial rollback

Debezium JDBC sink

Configuration available to column/table naming strategies

Debezium Embedded

New polling started/ended callbacks added

Debezium Server

Milvus allows unwinding of JSON data types

Redis can skip heartbeat messages

Debezium AI

Introduce timeout for Ollama embedding models

Debezium Platform

Improve navigation and workflow for transformations

Other changes

Chris Cranford

About Debezium

Get involved

Improved the `online_catalog` mining strategy performance

Improved the `hybrid` mining strategy performance