
Another release cadence done, and we’re pleased to announce the next preview release of Debezium is available, 3.2.0.Alpha1. This release is built on Kafka 4.0 with several breaking changes with many improvements and bugfixes.
Let’s take a moment and dive into all these changes.
Breaking changes
With any new major release of software, there is often several breaking changes. The Debezium 3.2.0.Alpha1 release is no exception, so let’s discuss the major changes you should be aware of.
Debezium Core
Built on Kafka 4.0
Debezium 3.2 is built and tested using Kafka 4.0 (DBZ-8875).
Debezium for IBMi
Error raised on missing journal files
In previous versions of Debezium for IBMi, if a journal file was missing, the connector resumed from the first available journal. This would lead to silent data loss. This behavior has changed and the connector will now throw an error if a required journal file no longer exists (DBZ-8898).
Debezium for Oracle
Using secure mTLS connections with JKS
When using the Debezium for Oracle connector to establish a secure mTLS connection using Java Keystores (JKS), special configuration is necessary. We have added this information to the Oracle connector documentation (DBZ-8788).
New features and improvements
The following describes all noteworthy new features and improvements in Debezium 3.2.0.Alpha1. For a complete list, be sure to read the release notes for more details.
Debezium Core
Regression with logging performance
In Debezium 3.1, a change was introduced to centralize the logging of sensitive information. Unfortunately, this change introduced a regression, leading to lower performance across a variety of code paths.
This change has been reverted and replaced with an implementation that retains the centralized logging intent while restoring the prior performance (DBZ-8879).
Reset certain streaming metrics through JMX
During periods of idle activity, a Debezium connector will continue to report LagBehindSource
JMX metric as the last computed value, as this value is only updated as new changes are received. For some environments, this is less than desirable or not intuitive if you are unaware of the idle or low activity window.
Debezium 3.1.1.Final introduces a new option that can be triggered through JConsole or other JMX integrations to reset the current LagBehindSource
metric by calling the new function resetLagBehindSource
(DBZ-8885).
Log JMX MBean name when registration fails
When the JMX metric MBean fails registration, it’s most often due to the fact that an existing MBean is already registered with such name. This can happen for a variety of reasons, including a prior task not stopping gracefully or misconfiguration between two connectors sharing the same topic.prefix
.
Unfortunately the logged information does not provide adequate information to know which MBean and connector the failure is about. To address this, the JMX MBean name will be logged when the registration fails so that the exact connector deployment that’s affected can be easily determined (DBZ-8862).
Debezium for IBMi
Add support for BOOLEAN data types
IBM provides support for the BOOLEAN
data types on V7R5+ and later V7R4 releases of the IBM iSeries database. In prior versions, if a table contained a boolean data type, this would result in a hard failure with the Debezium for IBMi connector.
With Debezium 3.2, BOOLEAN
data types are now captured without throwing an exception (DBZ-7796).
Add decimal handling mode support
The decimal.handling.mode
configuration property specifies how the connector should handle floating point values for connector-specific data types. In prior versions of the Debezium for IBMi connector, this configuration property was not honored.
Debezium 3.2 introduces support for decimal.handling.mode
, allow you to specify whether floating point values are serialized (DBZ-8301). This configuration property can be configured with one of three values:
precise
-
Represents values precisely by using
java.math.BigDecimal
values represented in change events in a binary form. double
-
Represents values by using
double
values. Usingdouble
values is easier, but can result in a loss of precision. string
-
Encodes values as formatted strings. Using the
string
option is easier to consume, but results in a loss of semantic information about the real type.
Debezium for Oracle
Unbuffered LogMiner adapter using committed data only
Debezium for Oracle introduces a new adapter implementation that buffers transactions on the relational database using the database’s memory buffers rather than relying on the JVM heap or off-heap cache implementations (DBZ-8924).
To get started with the new adapter, the connector configuration needs to be adjusted, as shown here:
{
"database.connection.adapter": "logminer_unbuffered"
}
While most of the |
This new adapter implementation is based on Oracle LogMiner’s COMMITTED_DATA_ONLY
mode. This mode forces Oracle LogMiner to use the database’s available memory to buffer transactions and to only supply Debezium with committed changes. Because Debezium only receives committed changes, the connector can immediately dispatch changes as they’re read which avoids complex memory sizing requirements along with the need to handle partial rollbacks due to savepoints or constraint violations.
Because Oracle LogMiner is now responsible for buffering transactions until the commit is observed, Oracle LogMiner will rely on the database’s memory to handle this staging requirement. This directly means that this staging is only as powerful as the PGA_AGGREGATE_LIMIT
database administrator’s configured Oracle database instance parameter. When a transaction’s memory requirement exceeds this limit, Oracle will refuse to buffer the transaction and will throw a connector error. This can be resolved by raising the configured limit, removing the limit entirely, or resizing your transactions so they fit within the boundary of this Oracle limit.
This feature is currently in incubating state, i.e. exact semantics, configuration options etc. may change in future revisions, based on the feedback we receive. Please let us know if you encounter any problems while using this extension. |
Filter Oracle LogMiner results by client id
The Oracle LogMiner adapters provide a myriad of ways to exclude transactions by explicitly passing filters to the database query to excluding or only including transactions performed by specific database usernames. In this release, we’ve added another interesting filter criteria based on the Oracle LogMiner field CLIENT_ID
, where you can elect to include or exclude changes based on this field’s value (DBZ-8904).
The following configuration properties can be used:
log.mining.clientid.include.list
-
Specifies a comma-separated list of values to match against the
CLIENT_ID
field for capture. log.mining.clientid.exclude.list
-
Specifies a comma-separated list of values to match against the
CLIENT_ID
field to exclude from capture.
Just like any of the other include/exclude configuration properties, these are mutually exclusive.
Reduced CPU utilization under specific scenarios
In Debezium 3.1, we introduced a change as part of DBZ-8665 to restore the same performance from 2.7.0.Final when processing constraint violations or save point rollback operations. While this change was successful at reducing the latency caused by processing such events in Debezium 3.0 through 3.0.7, we found that even the performance from Debezium 2.7 was overall suboptimal.
We have implemented a complete rework of the transaction buffering solution to handle constraint violations and save point rollbacks more efficiently (DBZ-8860).
When using heap-based buffering, we reduced the time needed to process such events by nearly 90% while also reducing the time complexity for off-heap buffering by 97-99%. In addition to the time complexity reduction, we have also reduced the overall CPU usage while handling these events to remain aligned with expectations.
Improved the online_catalog
mining strategy performance
Prior to adding the hybrid
mining strategy, the Debezium Oracle LogMiner implementation included a specific condition to include events for tables where LogMiner failed to resolve the table name. This use case happens when the object id and version in the redo entry does not match the online data dictionary, which occurs after specific DDL operations are performed.
Including these changes, particularly when users perform bulk operations and LogMiner fails to resolve the table name, this increases latency, connector overhead, only so that the connector can log the unknown table. Given the reduction of performance solely for logging, we have chosen to omit including these events in the data fetch moving forward (DBZ-8926).
Improved the hybrid
mining strategy performance
We also identified another performance bottleneck, this time when using the hybrid
mining strategy while processing bulk events where LogMiner failed to resolve the table name during object id/version mismatches. The hybrid
strategy is designed to handle this use case and fallback to Debezium’s relational model to resolve the table name; however, despite using a cache, the cost overhead for the cache lookups for bulk operations was significantly high.
In order to reduce the cost and improve throughput performance of bulk operations on unknown tables, we have reworked the lookup in a way that increases the throughput by significantly, allowing bulk operations to be handled more efficiently and with less overall CPU utilization (DBZ-8925).
Improve log message when failing to apply a partial rollback
When using the Debezium for Oracle LogMiner buffered adapter, when a partial rollback is observed, the log entry does not capture critical information that could be useful for debugging purposes. Debezium 3.2 now includes the transaction identifier and the system change number associated with the partial rollback redo entry (DBZ-8944).
Debezium JDBC sink
Configuration available to column/table naming strategies
The Debezium JDBC sink offers a variety of configurable hooks, including the option to define deployment-specific TableNamingStrategy
or ColumnNamingStrategy
implementations. However, these implementations were unable to obtain the full configuration for the connector deployment, making these hooks far less restrictive than intended.
With Debezium 3.2, these strategy implementations provide a new configure method (DBZ-7051), shown here:
@Override
public void configure(Map<String, Object> config) {
}
To provide backward compatibility, this method is defined as default with no implementation, so no code changes are required where this configuration step is unnecessary.
Debezium Embedded
New polling started/ended callbacks added
The DebeziumEngine
interface provides a variety of features, including knowing when the connector or task has started or stopped. However, because the connector operates asynchronously, it may be useful to know when the connector has entered the polling phase for specific user-driven logic.
To address this concern, two new methods have been added to the ConnectorCallback
contract (DBZ-8948):
/**
* Called after all the tasks have been successfully started and engine has moved to polling phase, but before actual polling is started.
*/
default void pollingStarted() {
// nothing by default
}
/**
* Called after all the tasks have successfully exited from the polling loop, i.e. the callback is not called when any of the tasks has thrown
* exception during polling or was interrupted abruptly.
*/
default void pollingStopped() {
// nothing by default
}
Now when setting up a DebeziumEngine
instance, the ConnectorCallback
supplied implementation can include logic to call during these lifecycle state changes as needed.
Debezium Server
Milvus allows unwinding of JSON data types
Debezium source connectors are designed to emit JSON data type values as a io.debezium.data.Json
semantic type that encodes the JSON value as a string. However, this may not always be the desired outcome when sinking changes to a Milvus sink using Debezium Server.
A new configuration property, debezium.sink.milvus.unwind.json
, has been added that can be set to either true
or false
(the default). When this property is set to true
, the JSON string value will be represented as a JsonObject
instead (DBZ-8909).
Redis can skip heartbeat messages
Debezium source connectors are often configured with heartbeat events so that at no point during low activity periods that the offset information for the source becomes stale. However, for some sinks like Redis, these heartbeat events aren’t useful to be passed to the sink target.
A new configuration property, debezium.sink.redis.skip.heartbeat.messages
, has been added that can be set to either true
or false
(the default). When this property is set to true
, the Redis sink will skip emitting heartbeat events to the Redis target; however, the heartbeat events will continue to influence the management of stale offsets (DBZ-8911).
Debezium AI
Introduce timeout for Ollama embedding models
We have added a new configuration property ollama.operation.timeout.ms
for the Debezium AI Ollama model integration using the FieldToEmbedding
transformation. This configuration property specifies the number of milliseconds that the model operation is allowed to execute for before the request is timed out. By default, the transformation waits 15 seconds, but can be adjusted accordingly (DBZ-8908).
Debezium Platform
Improve navigation and workflow for transformations
We have added several new features to the Debezium Management Platform interface for transformations (DBZ-8328), which includes:
-
A new main navigation menu option called Transforms.
-
Improved the pipeline creation workflow allowing transformations to be added in-flight.
-
Support for modifying and removing existing transformations.
We’d love your feedback on the new navigation workflow and improvements around transformations.
Other changes
-
Incorrect NumberOfEventsFiltered metrics in streaming DBZ-8576
-
Signal table column names are arbitrary, but delete strategy expects column named id DBZ-8723
-
Prevent write operations in PostgreSQL in read-only mode. DBZ-8743
-
Upgrade MariaDB driver to 3.5.3 DBZ-8758
-
Add Localization support to UI DBZ-8859
-
Upgrade RocketMQ version from 5.1.4 to 5.2.0 DBZ-8864
-
Bump Chicory version and take advantage of latest improvements DBZ-8867
-
When using the Oracle relaxed SQL parser setup, strings with apostrophe followed by comma are trimmed DBZ-8869
-
Oracle Ehcache buffer will silently evict entries when configured size limits are reached DBZ-8874
-
Improve MySQL/MariaDB connector resilience during post-schema recovery reconnect DBZ-8877
-
Transaction events are not removed when transaction event count over threshold DBZ-8880
-
Setting Oracle buffer type to an unsupported/invalid value is not validated properly DBZ-8886
-
Oracle timestamp columns are ignored when temporal mode set to ISOSTRING DBZ-8889
-
Kinesis Connector does not send failed records during retry, it sends records in original batch DBZ-8893
-
DDL parsing fails on "BY USER FOR STATISTICS" virtual column clause DBZ-8895
-
Postgres CapturedTables metric isn’t populated. DBZ-8897
-
Raise more meaningful exception in case of inconsistent post processor config DBZ-8901
-
Update Outbox Extension Quarkus version to 3.21.2 DBZ-8905
-
Update to latest LTS of Quarkus 3.15.4 DBZ-8906
-
FieldToEmbedding SMT fails with NPE for delete records DBZ-8907
-
FieldToEmbedding SMT crashes when source field name is substring of embedding name DBZ-8910
-
Improve performance by removing unnecessary filter check DBZ-8921
-
Async engine doesn’t terminate gracefully upon StopEngineException DBZ-8936
-
Remove unnecessary metadata query and map fetch calls DBZ-8938
-
Processing error because of incomplete date part of DATETIME datatype in MariaDB DBZ-8940
-
[Conductor] Add endpoint to verify correct setup of signal data collection DBZ-8941
-
ORA-08186 invalid timestamp specified occurs when connector is started DBZ-8943
-
Multiple Predicates Don’t Function with the Operator API DBZ-8975
In total, 78 issues were resolved in Debezium 3.2.0.Alpha1. The list of changes can also be found in our release notes.
A big thank you to all the contributors from the community who worked diligently on this release: Andrea Peruffo, Anil Dasari, Anisha Mohanty, Ashok, Bhagyashree Goyal, Oskar Bonde, Chris Cranford, Gaurav Miglani, Giovanni Panice, Gunnar Morling, Gustavo Lira, Haris Bin Saif, Haris Osmanagić, Jiri Pechanec, Joseph Koshakow, Kavya R, Kavya Ramaiah, Mario Fiore Vitale, Micro Huang, Petar Kostov, Peter Hamer, Philipp Bouzid, Rajendra Dangwal, Robert Roldan, Vojtech Juranek, Yossi Shirizli
About Debezium
Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.
Get involved
We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.