I’m pleased to announce the release of Debezium 1.6.0.Beta1!
This release introduces incremental snapshot support for SQL Server and Db2, performance improvements for SQL Server, support for BLOB/CLOB for Oracle, and much more. Lets take a few moments and explore some of these new features in the following.
Incremental Snapshotting - SQL Server / Db2
Debezium first introduced incremental snapshotting in 1.6.0.Alpha1. As discussed in this blog post, there are several pain points that exist when running Debezium:
-
the necessity to execute consistent snapshots before streaming has begun upon connector restarts
-
inability to trigger full or even partial snapshots after having the connector running for extended periods of time
With this release, this feature has been extended to both the SQL Server and Db2 connectors. We intend to continue to roll this feature out to additional connectors in future releases.
If you would like to try the feature yourself then you need to
-
provide a signalling table
-
trigger an ad-hoc incremental snapshot by using a SQL command like
INSERT INTO myschema.debezium_signal VALUES('ad-hoc-1', 'execute-snapshot', '{"data-collections": ["schema1.table1", "schema1.table2"]}')
SQL Server Performance Improvement
The SQL Server connector option, source.timestamp.mode
, controls how the timestamp for an emitted event is resolved. The default commit
setting is designed to resolve the timestamp based on when the change record was committed in the database. It was identified that this method used separate JDBC calls to resolve the timestamp for an event, which caused a loss in both performance and throughput.
This release fixes the commit
mode performance problem by moving where the timestamp is resolved. This substantially increases the connector’s performance and throughput while maintaining existing functionality.
We would like to thank Sergei Morozov for identifying and contributing a solution to this problem.
Oracle Large Object Data Types
In the era of "Big Data", its not all that uncommon to use data types such as BLOB
and CLOB
to store large object data. The Debezium Oracle connector has supported a wide range of data types and we’re happy to report that we’ve now extended that support to cover large both BLOB and CLOB for both the XStream and LogMiner based implementations.
When emitting events that contain BLOB
or CLOB
data, the memory footprint of the connector as well as the emitted event’s message size will be directly impacted by the size of the large object data. As a result, the connector’s JVM process may require additional memory as well as adjusting some Kafka configurations, such as message.max.bytes
.
We encourage the community to test drive the support for these new data types and report any and all feedback.
Other Features
Further fixes and improvements in this release include the following:
-
The Debezium connector for Oracle now supports
ALTER TABLE
andDROP TABLE
automatically (DBZ-2916) -
The Debezium connector for Oracle is tested and validated using ojdbc.jar version 21.1.0.0 (DBZ-3460)
-
The Debezium connector for MonogDB could lead to lost change events where a long running snapshot was greater than the configured oplog window (DBZ-3331); the connector now validates the oplog position’s existance when streaming starts
-
The Debezium connector for Cassandra was not responding to schema changes correctly (DBZ-3417)
Altogether, a total of 52 issues have been addressed for this release.
As always, a big thank you to all the community members who contributed: Alfusainey Jallow, Bingqin Zhou, Cao Manh Dat, John Martin, John Wu, Mike, Olivier Jacquemart, Sergei Morozov, SiuFay, Stefan Miklosovic, Thomas Aregger, and Vadzim Ramanenka.
About Debezium
Debezium is an open source distributed platform that turns your existing databases into event streams, so applications can see and respond almost instantly to each committed row-level change in the databases. Debezium is built on top of Kafka and provides Kafka Connect compatible connectors that monitor specific database management systems. Debezium records the history of data changes in Kafka logs, so your application can be stopped and restarted at any time and can easily consume all of the events it missed while it was not running, ensuring that all events are processed correctly and completely. Debezium is open source under the Apache License, Version 2.0.
Get involved
We hope you find Debezium interesting and useful, and want to give it a try. Follow us on Twitter @debezium, chat with us on Zulip, or join our mailing list to talk with the community. All of the code is open source on GitHub, so build the code locally and help us improve ours existing connectors and add even more connectors. If you find problems or have ideas how we can improve Debezium, please let us know or log an issue.