As the temperature for summer continues to rise, I’m please to announce that Debezium has some really cool news, Debezium 2.7.0.Alpha1 is now available for testing. This release includes a variety of new changes and improvements across various connectors like MongoDB, MariaDB, MySQL, Oracle, Vitess, and the Kubernetes Operator, to a myriad of subtle fixes and improvements across the entire Debezium portfolio. Let’s take a moment and dive into some highlights…
Breaking changes
The team aims to avoid any potential breaking changes between minor releases; however, such changes are sometimes inevitable.
- Core
-
-
It was identified that certain JDBC queries could indefinitely block in the case of certain communication failures. To combat this problem, a new configurable timeout option,
query.timeout.ms
is available to set the maximum time that a JDBC query can execute before being terminated (DBZ-7616).
-
- SQL Server
-
-
The SQL Server connector previously processed all transactions captured during a single database round trip. This behavior is configurable and is based on
max.iterations.transactions
, which defaults to processing all transactions (value of0
). This could lead to unexpected out of memory conditions if your database has a high volume of transactions.
To address this for these use cases, the default value formax.iterations.transactions
has changed to500
, to be more resilient for these deployment use cases out-of-the-box. If you want to return to the previous behavior, simply add this configuration option to your connector with a value of0
(DBZ-7750).
-
New features and improvements
Debezium 2.7.0.Alpha1 also introduces many improvements and features, lets take a look at each individually.
Install Debezium Operator with Helm Chart
To improve the deployment of the Debezium Operator, it can be installed with a helm chart at https://charts.debezium.io. This avoids the overly complicated deployment model of installing the operator into separate namespaces, minimizing the complexities for managing multiple Debezium Server deployments on Kubernetes.
Support predicate conditions for MongoDB incremental snapshots
The incremental snapshot process is an instrumental part in various recovery situations to collect whole or part of the data set from a source table or collection. Relational connectors have long supported the idea of supplying an additional-conditions
value on the incremental snapshot signal to restrict the data set, providing for targeted resynchronization of specific rows of data.
We’re happy to announce that this is now possible with MongoDB (DBZ-7138). Unlike relational databases, the additional-conditions
should be supplied in JSON format. It will be applied to the specified collection using the find
operation to obtain the subset list of documents that are to be incrementally snapshotted.
New MariaDB standalone connector
Debezium 2.5 introduced official support for MariaDB as part of the existing MySQL connector. The next step in that evolution is here, with a new standalone connector implementation for MariaDB (DBZ-7693).
There are few things worth noting here:
-
MariaDB and MySQL both have a common shared dependency on a new abstract connector called
debezium-connector-binlog
, which provides the common framework for both binlog-based connectors. -
Each standalone connector now specifically is tailored only to its target database, so MySQL users should use MySQL and MariaDB users should use MariaDB. As a result, the
connection.adapter
configuration option has been removed, and thejdbc.protocol
configuration option is now only specific to certain MySQL use cases and not used by MariaDB.
The documentation for this connector is still a work-in-progress and will be added in the future. For the moment, you can refer to the MySQL connector documentation for most things related to MariaDB.
ExtractNewDocumentState includes document id for MongoDB deletes
In prior release of the MongoDB ExtractNewDocumentState
single message transformation, a delete event did not provide the identifier as part of the payload. This reduced the meaningfulness of delete events as consumers were supplied with insufficient data to act on these events. This behavior has been improved, and the delete event now includes an _id
attribute in the payload (DBZ-7695).
Transaction metadata encoded ordering
In some pipelines, ordering is critical for consuming applications. There are certain scenarios that can impact this aspect of your data pipeline, such as when Kafka re-partition occur. This leads to problems that can be error-prone trying to reconstruct the ordering after-the-fact.
Now when Transaction Metadata is enabled, these metadata events will also encode their transaction order, so that in the event that a Kafka re-partition or other scenarios occur that alter the ordering semantics, consumers can simply use the new encoded ordering field instead for deterministic ordering of transactions (DBZ-7698).
Blocking incremental snapshot improvements
There are some use cases where incremental snapshot signals require escaping certain characters in the fully-qualified table name. This caused some problems with blocking snapshots because the process to resolve what tables to snapshot used a slightly different mechanism. In Debezium 2.7, we’ve unified this approach, and you can now use escaped table names with blocking snapshots where applicable (DBZ-7718).
Cassandra performance improvement
The Cassandra connector also saw some changes in Debezium 2.7, specifically to performance optimizations. The implementation of the KafkaRecordEmitter
relied on a thread-synchronization block that reduced the throughput. In addition, the implementation also performed some unnecessary flushing which also impacted performance. This code has been rewritten to improve both throughput and reduce the unnecessary flush calls (DBZ-7722).
New Oracle "RawToString" custom converter
While Oracle recommends that users avoid using RAW
-based columns, these columns are still widely used in standard Oracle tables for backward compatibility reasons. But there are also business use cases where it makes sense to continue to use RAW
columns rather than other data types.
Debezium 2.7 introduces a new custom converter specifically for Oracle called RawToStringConverter
(DBZ-7753). This custom converter is designed to allow you to quickly convert the byte-array contents of the RAW
column to a string-based field using a STRING
schema type. This can be useful for situations where you use a RAW
column to store character data that doesn’t require the collation overhead of VARCHAR2
, but you still have the need for this field to be sent to consumers as string-based data.
To get started with this custom converter, please see the documentation for more details.
Improved NLS character-set support for Oracle
When installing the Debezium 2.7 Oracle connector, you may notice a new dependency, orai18n.jar
. This dependency is being automatically distributed to provide extended character-set support for certain dialects (DBZ-7761).
Improved temporal support in Vitess
Debezium relational connectors rely on a configuration option, time.precision.mode
, to control how temporal values are added to change events. In some cases, you may want to use modes that align with Kafka types, using the connect
mode. In other cases, you may prefer to avoid precision loss by using the default, adaptive_milliseconds
mode.
The Debezium for Vitess connector has traditionally not followed this model, and instead has emitted temporal values as string-based types. While this helps avoid the loss of precision problem when using the connect
mode, this adds unnecessary overhead on consumers to parse and manipulate these values.
In Debezium 2.7, Vitess aligns this behavior with other relational connectors, using the time.precision.mode
to control how temporal values are sent (DBZ-7773). By default, it will use the adaptive_milliseconds
mode, but you can customize this to use connect
mode if you prefer. The emission of string-based temporal values has been removed.
Other changes
Altogether, 50 issues were fixed in this release. Here are a list of some additional noteworthy changes:
-
Builtin database name filter is incorrectly applied only to collections instead of databases in snapshot DBZ-7485
-
Upgrade Debezium Quarkus Outbox to Quarkus 3.9.2 DBZ-7663
-
After the initial deployment of Debezium, if a new table is added to MSSQL, its schema is was captured DBZ-7697
-
The test is failing because wrong topics are used DBZ-7715
-
Incremental Snapshot: read duplicate data when database has 1000 tables DBZ-7716
-
Handle instability in JDBC connector system tests DBZ-7726
-
SQLServerConnectorIT.shouldNotStreamWhenUsingSnapshotModeInitialOnly check an old log message DBZ-7729
-
Fix MongoDB unwrap SMT test DBZ-7731
-
Snapshot fails with an error of invalid lock DBZ-7732
-
Column CON_ID queried on V$THREAD is not available in Oracle 11 DBZ-7737
-
Redis NOAUTH Authentication Error when DB index is specified DBZ-7740
-
Getting oldest transaction in Oracle buffer can cause NoSuchElementException with Infinispan DBZ-7741
-
The MySQL Debezium connector is not doing the snapshot after the reset. DBZ-7743
-
MongoDb connector doesn’t work with Load Balanced cluster DBZ-7744
-
Align unwrap tests to respect AT LEAST ONCE delivery DBZ-7746
-
Exclude reload4j from Kafka connect dependencies in system testsuite DBZ-7748
-
Pod Security Context not set from template DBZ-7749
-
Apply MySQL binlog client version 0.29.1 - bugfix: read long value when deserializing gtid transaction’s length DBZ-7757
-
Change streaming exceptions are swallowed by BufferedChangeStreamCursor DBZ-7759
-
Use thread cap only for default value DBZ-7763
-
Evaluate cached thread pool as the default option for async embedded engine DBZ-7764
-
Sql-Server connector fails after initial start / processed record on subsequent starts DBZ-7765
-
Valid resume token is considered invalid which leads to new snapshot with some snapshot modes DBZ-7770
-
Improve processing speed of async engine processors which use List#get() DBZ-7777
-
NO_DATA snapshot mode validation throw DebeziumException on restarts if snapshot is not completed DBZ-7780
-
DDL statement couldn’t be parsed DBZ-7788
-
Document potential null values in the after field for lookup full update type DBZ-7789
-
old class reference in ibmi-connector services DBZ-7795
-
Documentation for Debezium Scripting mentions wrong property DBZ-7798
-
Fix invalid date/timestamp check & logging level DBZ-7811
A huge thank you to all contributors from the community who worked on this release: Amirmohammad Sadat Shokouhi, Andrey Pustovetov, Anisha Mohanty, Chris Cranford, Chris Recalis, Jakub Cechacek, Jiri Novotny, Jiri Pechanec, Jochen Schalanda, Lourens Naudé, Mario Fiore Vitale, Martin Medek, Ondrej Babec, Rajendra Dangwal, Robert Roldan, Robin Moffatt, Roman Kudryashov, Selman Genç, Thomas Thornton, Vojtech Juranek, and ismail simsek!
What’s next?
Debezium 2.7 is just getting underway and we have a number of additional changes planned, including a MongoDB sink connector, expanding Oracle 23 support, a new SPI to aid in the memory-footprint of certain multi-tenant schema architectures and more. You can find more about what is planned for Debezium 2.7 on our road map.
The team is also in the final stages of defining our face-to-face agenda. if you have any suggestions or ideas that you would like for us to discuss or would like to see planned in 2.7 or a future release, please feel free to get in touch with us on our mailing list or in our Zulip chat.
Until next time…
About Debezium
Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.
Get involved
We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.