
It has only been three weeks since we released Debezium 3.1.0.Final, and we’re happy to report the first maintenance release has arrived, 3.1.1.Final. This release includes several critical performance improvements and a variety of bug fixes.
In this post, we’re going to take a deep dive into the performance improvements made across several key modules of Debezium, discussing any new features, and explaining any changes that could impact your upgrade process. As always, we recommend you read the release notes to learn about all the bugs that were fixed, update procedures, and more.
Breaking changes
With any new release of software, there is often several breaking changes. This release is no exception, so let’s discuss the major changes you should be aware of before upgrading to Debezium 3.1.1.Final.
Debezium for Oracle
Using secure mTLS connections with JKS
When using the Debezium for Oracle connector to establish a secure mTLS connection using Java Keystores (JKS), special configuration is necessary. We have added this information to the Oracle connector documentation (DBZ-8788).
New features and improvements
The following describes all noteworthy new features and improvements in Debezium 3.1.1.Final. For a complete list, be sure to read the release notes for more details.
Debezium Core
Regression with logging performance
In Debezium 3.1, a change was introduced to centralize the logging of sensitive information. Unfortunately, this change introduced a regression, leading to lower performance across a variety of code paths.
This change has been reverted and replaced with an implementation that retains the centralized logging intent while restoring the prior performance (DBZ-8879).
Reset certain streaming metrics through JMX
During periods of idle activity, a Debezium connector will continue to report LagBehindSource
JMX metric as the last computed value, as this value is only updated as new changes are received. For some environments, this is less than desirable or not intuitive if you are unaware of the idle or low activity window.
Debezium 3.1.1.Final introduces a new option that can be triggered through JConsole or other JMX integrations to reset the current LagBehindSource
metric by calling the new function resetLagBehindSource
(DBZ-8885).
Improved exceptions during post processing configuration validation
When supplying a post.processor
configuration, if there is a mismatch or missing configuration for a named post processor, Debezium would throw a NullPointerException
. It’s always preferred that in such scenarios, Debezium raise a custom exception with a more descriptive reason for the failure.
Debezium 3.1.1.Final now performs better validation of these configurations and will now report a valid error stating that the post processing configuration is invalid or missing required values (DBZ-8901).
Debezium AI
Improved resilience of FieldToEmbedding SMT
There were several use cases where the FieldToEmbedding
transformation did not perform was expected. For example, when processing delete records, the transformation would fail with a NullPointerException
(DBZ-8907) while in others the transformation would crash wehn source field names are substrings of an embedded name (DBZ-8910).
These problems have been corrected and the FieldToEmbedding
transformation should be more resilient.
Debezium for Oracle
Reduced CPU utilization under specific scenarios
In Debezium 3.1, we introduced a change as part of DBZ-8665 to restore the same performance from 2.7.0.Final when processing constraint violations or save point rollback operations. While this change was successful at reducing the latency caused by processing such events in Debezium 3.0 through 3.0.7, we found that even the performance from Debezium 2.7 was overall suboptimal.
We have implemented a complete rework of the transaction buffering solution to handle constraint violations and save point rollbacks more efficiently (DBZ-8860).
When using heap-based buffering, we reduced the time needed to process such events by nearly 90% while also reducing the time complexity for off-heap buffering by 97-99%. In addition to the time complexity reduction, we have also reduced the overall CPU usage while handling these events to remain aligned with expectations.
Improved the online_catalog
mining strategy performance
Prior to adding the hybrid
mining strategy, the Debezium Oracle LogMiner implementation included a specific condition to include events for tables where LogMiner failed to resolve the table name. This use case happens when the object id and version in the redo entry does not match the online data dictionary, which occurs after specific DDL operations are performed.
Including these changes, particularly when users perform bulk operations and LogMiner fails to resolve the table name, this increases latency, connector overhead, only so that the connector can log the unknown table. Given the reduction of performance solely for logging, we have chosen to omit including these events in the data fetch moving forward (DBZ-8926).
Improved the hybrid
mining strategy performance
We also identified another performance bottleneck, this time when using the hybrid
mining strategy while processing bulk events where LogMiner failed to resolve the table name during object id/version mismatches. The hybrid
strategy is designed to handle this use case and fallback to Debezium’s relational model to resolve the table name; however, despite using a cache, the cost overhead for the cache lookups for bulk operations was significantly high.
In order to reduce the cost and improve throughput performance of bulk operations on unknown tables, we have reworked the lookup in a way that increases the throughput by significantly, allowing bulk operations to be handled more efficiently and with less overall CPU utilization (DBZ-8925).
Summary
In total, 28 issues were resolved in Debezium 3.1.1.Final. The list of changes can also be found in our release notes.
A big thank you to all the contributors from the community who worked diligently on this release:
Alvar Viana, Andrea Peruffo, Anisha Mohanty, Ashok, Bhagyashree Goyal, Oskar Bonde, Chris Cranford, Giovanni Panice, Haris Osmanagić, Jakub Cechacek, James Johnston, Jiri Pechanec, Katerina Galieva, Katsumi Miyajima, Krzysztof Grzechnik, Mario Fiore Vitale, Markus Kull, Minjae Lee, Nathan Smit, Rajendra Dangwal, Robert Roldan, Roman Kudryashov, Thomas Thornton, Vadzim Ramanenka, Victor Castaño, Vojtech Juranek, Vojtěch Juránek, Yuriy Vikulov, Zakariae Ben Allal, Zhongqiang Gong, kesompochy, حمود سمبول
About Debezium
Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.
Get involved
We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.